Simulations

Part 2: I Trained an AI to Measure Luck in Brisca

Author

Hector C. Ortiz

Date Published

briscas with statistical lines in the back

In Part 1, I ran an experiment: an “advanced” bot versus a completely random player.

The result? Randomness wins about 30% of the time.

Which was… not ideal.

To be fair, the “advanced” bot was basically a few if statements and a dream. Calling it strategy was generous.

Maybe a smarter player could do better.

So, like any reasonable developer, I responded calmly and proportionally:

I trained a neural network to play Briscas.

The Plan

The idea is straightforward.

Instead of carefully crafting rules like “don’t throw away your aces” and pretending that counts as artificial intelligence, I let an algorithm play thousands games against itself and discover the strategy on its own.

This approach is called reinforcement learning. The AI plays a game, gets a snack for winning, its PlayStation taken away for losing, and slowly becomes a better player.

Think of it like a baby learning to walk.

Except the baby is a neural network, and instead of falling over repeatedly, it’s losing thousands of games of Brisca until it eventually stops embarrassing itself.

I actually trained two agents:

  1. Best agent: does its absolute best to win
  2. Worst agent: does its absolute best to lose

Yes, I trained an AI whose job is to be bad at Brisca.

Why?

Because it tells us the floor. If even an AI that is actively trying to lose still manages to win some games, then those wins are pure luck. Not even intentional incompetence could avoid them.

How It Works

The Environment

The first challenge was explaining the game to the neural network.

Neural networks don’t know what “the 3 of Oros” means. They don’t know what a card is, what a suit is, or why humans get emotionally invested in pieces of cardboard.

So I built a translator that converts the game state into a vector of 50 numbers:

  • 3 values for the cards in your hand
  • 2 values for the trump card (card ID + suit)
  • 2 values for cards currently on the table
  • 40 binary flags, a bitmap tracking every card that's been played so far (memory!)
  • 1 value for how many cards remain in the deck
  • 2 values for the current scores (yours and your opponent's)

That bitmap is the key advantage over the heuristic bot from Part 1. The heuristic bot has the memory of a goldfish. It doesn’t remember which cards have been played.

The RL agent, on the other hand? It remembers. It knows that if three of the four aces have already left the building, the last one is basically strutting around.

That's the kind of card-counting that good human players do instinctively, and that casinos would kick you out for.

The Action Space

The action space or what the model can do is almost insultingly simple: pick card 0, 1, or 2 from your hand. That's it. Just point at a card like a toddler choosing a chicken nugget and call it a day.

The cards are sorted by value, because the AI said so, and frankly, who are we to argue? It runs the whole operation around here.

The Algorithm

I used DQN, which works by learning a "quality score".

It's like if you played 10,000 games of cards, but instead of getting better at bluffing or reading people, you just quietly filled an enormous notebook with eerily specific statistics.

Things like "if I have these cards, and this is the trump suit, and these cards have already been played, then playing card 2 is worth +0.7 points on average."

Which, honestly, sounds less like artificial intelligence and more like a particular type of person we've all met at game night. You know the one.

The network architecture is two hidden layers of 256 neurons each, tiny by modern standards.

But Briscas is a relatively simple game, so we don't need to spin up a data center just to play cards.

The Reward

The reward signal is how the agent knows if it did something good or catastrophically stupid.

The reward signal here is just the final score, squished into a [-1, +1] range. Win big: +1. Lose badly: -1.

The best agent spends its whole life chasing +1. The worst agent is doing the exact same thing but in the oposit direction.

The Results

With both agents trained, I ran 10,000 evaluation games for each matchup against the random bot.

Best Agent vs Random


Best Agent

Random

Games Won

8163

1712

Percentage Won

81.63%

17.12%

Games Tied

125

125

Percentage Tied

1.25%

1.25%

The agent beats random 81.6% of the time, a massive upgrade over the 68.5% the heuristic bot managed in Part 1. Card memory matters, it turns out. Who would have guessed?

But random still wins 17% of the time.

An agent that has played thousands of training games, tracked every card, and runs a neural network under the hood... still loses to a bot that has never had a single thought in its life.

Worst Agent vs Random


Worst Agent

Random

Games Won

2018

7868

Percentage Won

20.18%

78.68%

Games Tied

114

114

Percentage Tied

1.14%

1.14%

An agent specifically trained to lose still wins 20% of its games.

It doesn't want to win. It has been explicitly, lovingly, carefully rewarded for losing.

And yet, one in five times, the card gods look down, shake their heads, and say "no, you're winning this one whether you like it or not."

Best Agent vs Worst Agent


Best Agent

Worst Agent

Games Won

9336

607

Percentage Won

93.36%

6.07%

Games Tied

57

57

Percentage Tied

0.57%

0.57%

The worst agent, despite its best efforts to throw every single game, still accidentally wins 6.1% of the time.

And honestly? It's not its fault.

Those games were impossible to lose.

The cards were dealt, the outcome was sealed before a single move was made. The worst agent could have been replaced by a particularly confused pigeon and gotten the exact same result.

Conclusions

So is Briscas a game of skill? Yes, the best player beats random 4 times more often than the worst player. But is it also a game of luck? Absolutely. Roughly 1 in 5 games is decided before the first card is played.

My friend who lost to a beginner? That's not an anomaly. That's just Briscas working as intended.