Epistemic Status: True story

This post intends to begin the sequence Zbybpu’f Nezl.

In college I once took a class called Rational Choice. Because obviously.

Each week we got the rules for, played and discussed a game. It was awesome.

For the grand finale, and to determine the winner of the prestigious Golden Shark Award for best overall performance, we submitted computer program architectures (we’d tell the professor what our program did, within reason, and he’d code it for us) to play The Darwin Game.

The Darwin Game is a variation slash extension of the iterated prisoner’s dilemma. It works like this:

For the first round, each player gets 100 copies of their program in the pool, and the pool pairs those programs at random. You can and often will play against yourself.

Each pair now plays an iterated Nash bargaining game, as follows. Each turn, each player simultaneously submits an integer number from 0 to 5. If the two numbers add up to 5 or less, each player earns points equal to their own number. If the two numbers add up to 6 or more, neither player gets points. This game then lasts for a large but unknown number of turns, so no one knows when the game is about to end; for us this turned out to be 102 turns.

Each pairing is independent of every other pairing. You do not know what round of the game it is, whether you are facing a copy of yourself, or any history of the game to this point. Your decision algorithm does the same thing each pairing. You do know the results of previous turns in the same pairing.

At the end of the round, all of the points scored by all of your copies are combined. Your percentage of all the points scored by all programs becomes the percentage of the pool your program gets in the next round. So if you score 10% more points, you get 10% more copies next round, and over time successful programs will displace less successful programs. Hence the name, The Darwin Game.

Your goal is to have as many copies in the pool at the end of the 200th round as possible, or failing that, to survive as many rounds as possible with at least one copy.

If both players coordinate to split the pot, they will score 2.5 per round.

To create some common terminology for discussions, ‘attack’ or means to submit 3 (or a higher number) more than half the time against an opponent willing to not do that, and to ‘fold’ or ‘surrender’ is to submit 2 (or a lower number) more than half the time, with ‘full surrender’ being to always submit 2. To ‘cooperate’ is to alternate 2 and 3 such that you each score 2.5 per round.

In this particular example we expected and got about 30 entries, and I was in second place in the points standings, so to win The Golden Shark, I had to beat David by a substantial amount and not lose horribly to the students in third or fourth.

What program do you submit?

(I recommend actually taking some time to think about this before you proceed.)

Some basic considerations I thought about:

1. The late game can come down to very small advantages that compound over time.

2. You need to survive the early game *and *win the late game. This means you need to succeed in a pool of mostly-not-smart programs, and then win in a pool of smart programs, and then in a pool of smart programs that outlasted other smart programs.

3. Scoring the maximum now regardless of what your opponent scores helps you early, but kills you late. In the late game, not letting your opponent score more points than you is very important, especially once you are down to two or three programs.

4. In the late game, how efficiently you cooperate with yourself is very important.

5. Your reaction to different programs in the mid game will help determine your opponents in the end game. If an opponent that outscores you in a pairing survives into the late game, and co-operates with itself, you lose.

6. It is all right to surrender, even fully surrender, to an opponent if and only if they will be wiped out by other programs before you become too big a portion of the pool, provided you can’t do better.

7. It is much more important to get to a good steady state than to get there quickly, although see point one. Getting people to surrender to you would be big game.

8. Some of the people entering care *way *more than others. Some programs will be complex and consider many cases and be trying hard to win, others will be very simple and not trying to be optimal.

9. It is hard to tell what others will interpret as cooperation and defection, and it might be easy to accidentally make them think you’re attacking them.

10. There will be some deeply silly programs out there at the start. One cannot assume early programs are doing remotely sensible things.

That leaves out many other considerations, including at least one central one. Next time, I’ll go over my and David’s preparations, and post three will reveal what happened on game night.

Note: Please do not comment here once you have read The Darwin Pregame.

One obvious consideration is that you should test the programs you think best a priori against a bunch of programs (including some obvious and silly ones for each simulated tournament, as well as a few that did well in previous simulated tournaments).

I.e. do so *before* deciding which program to submit.

@Pip: If the same group were to play again the result would be both almost certainly different and hard to predict. There’s some chance a similar thing would happen again but it would be pretty weird.

Indeed, depending on how much work you put into it, simulating tournaments might be a very good idea. I will say that the game is *very* sensitive to the initial conditions of the player pool, and what happened was not at all inevitable.

How much variance would you expect in the outcomes if the same group were to play again?

Pingback: Rational Feed – deluks917

How much time do we have until “next time” Professor Mowshowitz?

By design a few days, but it will be longer if I can’t find time to write and/or the discussion proves sufficiently interesting I don’t want to anchor/spoil it just yet.

Some other considerations – I’m assuming (which you implied but didn’t state) that your program has memory of prior turns in the same round.

(1) It’s worth figuring out what the optimal strategy is to play against a copy of yourself. Since there’s (presumably) no way to index which side is which, probably the optimal strategy is to play 2 with some probability, and 3 with some probability, such that the expected gains from ever playing 3 outweigh the losses from sometimes going over 5.

Let p be the probability you play a 2, and 1-p be the probability you play a 3. Then the EV of each round played against yourself is 2 * p + 3 * (1 – p) * p = 5p – 3p^2. This is maximized at p = 5/6.

(2) To successfully identify copies of yourself, you can burn the first few turns on a fixed sequence. In a 30-player tournament, three turns might be enough, since there’s only a 13% chance of a spurious match with one of the other players. Four turns cuts that by another factor of 6. You have to commit to sticking with the sequence before you actually generate three or four successive random numbers from 0 to 5.

This might not be exactly optimal – for instance, you might want to lose fewer initial points against yourself by limiting yourself to a lower range, and accept less information per turn – but it should be easy to figure out the cost-benefit situation for any such tradeoff.

(3) If you recognize an exact copy of yourself from the initial handshake, then you can do better than the strategy in (1) by generating an index. Each side plays a number from 0 to five, randomized on the fly (rather than once in advance, like the handshake). Once one side has played a number that’s higher than the other, you can deterministically have the first side play 3 and the second side play 2. This all gets added to your pool, and it doesn’t matter which side scored more. If the other side deviates from this strategy at all, then you can write off the initial correspondence as a lucky-for-them coincidence, and treat them like an enemy.

This might not be exactly optimal, but it should again be easy to figure out what the exactly optimal strategy is, again

(3) It’s worth figuring out what the “tit for tat” analogue is here. It can’t just be playing your opponent’s last move, since your opponent could play a cycle of 2-3-5 and gain at your expense. Instead, it has to be inductive, where you start out playing some mix of 2s and 3s, settle into an alternating 2-3 pattern if they do, but learn and retaliate in kind if their strategy substantially deviates from this.

(4) If you use an initial handshake against a player who is not a copy of you, you can learn something from their response – and you probably want to make sure they don’t “learn” the wrong lesson about

you. For instance, if your handshake is noticeably aggressive, you might want to be extra careful to forgive in the next few turns.An obvious alternative to (3) would be to play either 2 or 3 for the indexing turns, which trades off fewer lost turns against slower learning. Supposing you play 2 with probability p, each turn you have a p(1-p) chance of getting on average 2.5 points each and learning, a p^2 chance of getting 2 points each and extending your indexing period by a turn and a (1-p)^2 chance of getting 0 points and extending your indexing period by a turn.

At this point, I think most of the important stuff I haven’t thought through is how to treat players who aren’t you.

Edited to confirm that you know the previous turns in the pairing, as you assumed.

For the rest, choosing not to say anything right now, other than to confirm that yes if you’re up against a copy you want to be damn sure you get 5 points out of almost all turns.

I don’t think your exploitation of tit for tat doesn’t give an advantage except in the late game. The opponent would score 0-3-0 points, versus 2-0-0 for the tit-for tat player. If we’re dealing with a zero-sum situation with only two players, the exploiter has an edge, but I don’t expect this exploiter to get to the late game unless the meta is so uncooperative that 1 point per turn average is good.

Not 100% sure I get the rules. If A + B >5 no one gets points, but if A+B<=5 both players get A+B points.

So, as I consider it, there is not direct single game way to gain at other’s expense. In a prisoner’s dilemma, if you defect when they cooperate you get more points than them. Here if you submit 3 and they submit 2, you both get 5 points. They don’t get an advantage by submitting larger numbers.

You can’t reliably tank the other players score by submitting large numbers all the time, because they could submit zero and you can’t submit something larger than 5.

You can’t reliably tank the other players score by submitting zero all the time, because they could submit 5.

You probably want some protocol to make sure you always get 5 when you play yourself. Using some code for the first few moves is probably fine for this. With 6 choices 0-5, two moves is a 1/36 chance, not quite good enough with 30 bots, so you probably spend 3 moves identifying self.

You probably minimize total points against non self by submitting a random mix of zeros and 5s. 5 means they get no points with anything higher than zero, and zero means you don’t contribute to their point total. Picking the right ratio of 0s and 5s would require testing, but that ratio is the key to your attack so its very important. I think random might be better than smart here. You can't out predict random.

You might be able to improve this by also detecting a few obvious clean strategies (always submit 2, submit 5 minus other players last turn etc.) and maximizing point value in those match ups if you think they are likely prevalent in the early stages so you can wrack up some percentage early to increase the value of your self-playing later. Maximizing those is probably ok if other “smart” programs will prey on them in the mid game. If there are no smart programs preying on them though dumb cooperators will beat you if you spend 3 turns detecting before cooperating with dumb cooperators . So that's a risk, I'd want to see a few programs that preyed on these before relying on their existence. Assuming I found a few then I'd apply the logic form this paragraph. If I didn't find anything in the prey on dumb cooperators role, then I would scrap this part and just have a simple self / other logic.

Assuming the attack portion is tune right, this comes close to maxing points form self-play and minimizing points vs smart bots, which means if the pairings give you a larger population percentage than the other smart bots at any point you probably win from there.

This loses to a program that does exactly the same thing but spends 2 turns on handshake instead of 3.

I cleaned up the language a bit to clarify; if I submit 3 and you submit 2, I score 3 and you score 2. We don’t both get 5. A gets A points, B gets B, not A+B for both.

Ok, I was working from an incorrect premise, so most of my analysis will be wrong as written.

“Each turn, each player simultaneously submits a number from 0 to 5.” <– from context I think you mean just integers 0 to 5?

Yes. I will change it to integer to be clear.

Probably worth noting here that the game being interated here is the Nash bargaining game; it’s a bit different from the Prisoner’s Dilemma. I feel like it has more in common with Chicken/Snowdrift, though not sure whether that’s really the best fit either.

Quite right, I changed post to reflect that. They all flow into each other if you’re not careful! I think my brain was thinking of it as *effectively* a PD-style situation at a certain level, but calling it that as such is misleading.

I think it is worth the effort to figure out what the tit-for-tat equivalent would be, i.e. which strategy is super nice but punishes you if you try to screw it over. My starting point is something like “if you have strictly more points than I do I will play 3, otherwise I will play 2”.

This is nice because it lets you have 2.5 points if you alternate 3 and 2 with it. But if you insist on playing 3 every turn you will get no points at all. Also it is nice because it lets you go back by playing 2. One could make it more punishing and saying “if you didn’t cooperate for n turns I will start playing 4 to make my demands clear”. Then if you want to go back to alternating 2 and 3 you will have to play 1. Not sure if this has any worth though.

Then of course you need to implement an initial “hey we are the same program let’s cooperate and give us 5 points each round” in the beginning but after that I could see this actually working.

Start with a short identifying sequence of 0s,1s,4s, and 5s. Seeing any 2 or 3 drops into oppositional mode immediately. Have the last two characters of this sequence only convey one bit of identification each; {0,5} vs. {1,4}. Determine which of them to use randomly, to establish who will be the low side and who the high side. This gives you a 75% handshake chance and 75% polarity-match chance, vs. 75/50 for finding them separately, at a cost of about 1 point in expectation.

In adversarial cases, start out with literal tit-for-tat; play the last number the opponent played. Come up with a short list of possible default strategies; always-2, always-3, always-N, 0/5-alternating, 2/3-alternating, 1/4-alternating. Probably a couple others. If the output doesn’t look quite like tit-for-tat, and the discrepancy with one of these other simple strategies is smaller, play the FDT-best-response strategy. For alternating strategies it’s the same alternation but in reverse, for always-N for N2 it’s to play always-3. This will involve several special cases, but they only need to handle common strategies so it’s limited.

Actually, skip the handshake code. This will get optimality against itself without that by starting out with random 2/3 for ~4 rounds and then proceeding from there.

Pingback: The Darwin Pregame | Don't Worry About the Vase

One way to cooperate with yourself would be to start with a small shibboleth (some random sequence like 23223332). 8 bits is a reasonable amount, though I’m still a little scared you end up colliding. Maybe 10 bits? Starting with a random sequence doesn’t actually seem like it hurts you that much, though maybe it could be a problem against Grudger-style opponents.

Also, if you’re going to make a shibboleth, you might even add some 0s, 1s, 4s, or 5s in there. If you make a good sequence including these, you could fool opponents that try to learn from you, or learn to take advantage of interesting patterns yourself. But adding a bunch of bells and whistles like this may not help too much, because they take advantage of fringe strategies that don’t occur that much themselves.

If you had the ability to run your own simulation (like with a genetic algorithm), you could do that and see what strategies tend to win by landslides against a lot of different strategies. Probably any simulations you’d make would be pretty far from the actual pool of strategies chosen by your classmates. I’d expect simulations to be more accurate in the late game than in the early game though, which is what matters.

Pingback: The Darwin Results | Don't Worry About the Vase

Pingback: Book Review: The Elephant in the Brain | Don't Worry About the Vase