Previously in Sequence (Required): Simplified Poker
I spent a few hours figuring out my strategy. This is what I submitted.
If you start with a 2, you never want to bet, since your opponent will call with a 3 but fold with a 1. So we can assume no one who bets ever has a 2. But you might want to call a bet.
If you start with a 1, you never call a bet, but sometimes want to bet as a bluff.
If you start with a 3 in first position, sometimes you may want to check to allow your opponent to bet with a 1. If you have a 3 in second position, you have no decisions.
Thus, a non-dominated strategy can be represented by five probabilities: The chance you bet with a 1 in first position, chance you bet with a 3 in first position, chance you bet with a 1 in second position, chance you call with a 2 in first position, and chance you call with a 2 in second position. Call a set of these five numbers a strategy.
There were likely to be a few players bad enough to bet with a 2 or perhaps make the other mistakes, but I chose for complexity reasons not to worry about that, assuming I’d still do something close to optimal. If I was confident complexity was free, I’d have included a check to see if we ever caught the opponent doing something crazy, and adjust accordingly.
If you know the opposing strategy, what to do is obvious. Thus, I defined a function called ‘best response’ that takes a strategy, and outputs the strategy that maximizes against that strategy.
My goal was to derive the opponents’ strategy, then play the best response to that strategy.
As a safeguard against opponents who were anticipating such a strategy, I included an escape hatch: If at any point, my opponent got ahead by 10 or more chips, assume they were a level ahead of me, and playing the best response to what I would otherwise do. So derive what that is, and play the best response to that!
That skipped over the key puzzle, which is figuring out what the opponent is doing. On the first turn, I guessed opponents would pursue reasonable mixed strategies: bet a 1 about a third of the time, bet a 3 in first position about two thirds of the time, call with a 2 about half the time. I represented this with a virtual hand history that I included until I had enough real ones.
On subsequent turns, I looked at the hand history.
If the opponents’ card was revealed, that was a pure data point – if we knew they bet with a 1, that’s a hand where they did that.
If the opponents’ card wasn’t revealed, but only one card made any sense, I assumed they had that card. Thus, if I bet with a 1 and they fold, I assume they had a 2.
If the opponents’ card wasn’t revealed, and they could have had either card because you bet a 3 and they folded, or they bet and you folded a 2, that’s trickier. The probability of them having each card in that spot depends on their strategy. And again, there was a (unknown soft) complexity limit.
My solution was to assume that in each unique starting position (your position plus your card) half the time my opponent would draw the higher of the two cards I hadn’t drawn, and half the time he’d draw the lower one. So half the time I have a 2 in first position, he has a 3, half the time he has a 1.
That was definitely not ideal, and I don’t remember exactly how I did it, but it definitely did the thing it was designed to do: Identify exploitable agents lightning fast, and do something reasonable against reasonable ones. Trying to optimize the details of this type of approach is an interesting puzzle, both with and without a complexity limitation.