Simplified Poker Conclusions

Previously: Simplified PokerSimplified Poker Strategy

Related (Eliezer Yudkowsky): Meta Honesty: Firming Honesty Around Its Edge Cases

About forty people submitted programs that used randomization. Several of those random programs correctly solved for the Nash equilibrium, which did well.

I submitted the only deterministic program.

I won going away.

I broke even against the Nash programs, utterly crushed vulnerable programs, and lost a non-trivial amount to only one program, a resounding heads-up defeat handed to me by the only other top-level gamer in the room, fellow Magic: the Gathering semi-pro player Eric Phillips.

Like me, Eric had an escape hatch in his program that reversed his decisions (rather than retreating to Nash) if he was losing by enough. Unlike me, his actually got implemented – the professor decided that given how well I was going to do anyway, I’d hit the complexity limit, so my escape hatch was left out.

Rather than get into implementation details, or proving the Nash equilibrium, I’ll discuss two things: How few levels people play on, and the motivating point: How things are already more distinct and random than you think they are, and how to take advantage of that.

Next Level

In the comments to the first two posts, most people focused on finding the Nash equilibrium. A few people tried to do something that would better exploit obviously stupid players, but none that tried to discover the opponents’ strategy.

The only reason not to play an exploitable strategy is if you’re worried someone will exploit it!

Consider thinking as having levels. Level N+1 attempts to optimize against Levels N and below, or just Level N.

Level 0 isn’t thinking or optimizing, so higher levels all crush it, mostly.

Level 1 thinking picking actions that are generically powerful, likely to lead to good outcomes, without considering what opponents might do. Do ‘natural’ things.

Level 2 thinking considers what to do against opponents using Level 1 thinking. You try to counter the ‘natural’ actions, and exploit standard behaviors.

Level 3 counters Level 2. You assume your opponents are trying to exploit basic behaviors, and attempt to exploit those trying to do this.

Level 4 counters Level 3. You assume your opponents are trying to exploit exploitative behavior, and acting accordingly. So you do what’s best against that.

And so on. Being caught one level below your opponent is death. Being one level ahead is amazing. Two or more levels different, and strange things happen.

Life is messy. Political campaigns, major corporation strategic plans, theaters of war. The big stuff. A lot of Level 0. Level 1 is industry standard. Level 2 is inspired, exceptional. Level 3 is the stuff of legend.

In well-defined situations where losers are strongly filtered out, such as tournaments, you can get glimmers of high level behavior. But mostly, you get it by changing the view of what Level 1 is. The old Level 2 and Level 3 strategies become the new ‘rules of the game’. The brain chunks them into basic actions. Only then can the cycle begin again.

Also, ‘getting’ someone with Level 3 thinking risks giving the game away. What level should one be on next time, then?

Effective Randomization

There is a strong instinct that whenever predictable behavior can be punished, one must randomize one’s behavior.

That’s true. But only from another’s point of view. You can’t be predictable, but that doesn’t mean you need to be random.

It’s another form of illusion of transparency. If you think about a problem differently than others, their attempts to predict or model you will get it wrong. The only requirement is that your decision process is complex, and doesn’t reduce to a simple model.

If you also have different information than they do, that’s even better.

When analyzing the hand histories, I know what cards I was dealt, and use that to deduce what cards my opponent likely held, and in turn guess their behaviors. Thus, my opponent likely has no clue either what process I’m using, how I implemented it, or what data I’m feeding into it. All of that is effective randomization.

If that reduces to me always betting with a 1, they might catch on eventually. But since I’m constantly re-evaluating what they’re doing, and reacting accordingly, on an impossible-to-predict schedule, such catching on might end up backfiring. It’s the same at a human poker table. If you’re good enough at reading people to figure out what I’m thinking and stay one step ahead, I need to retreat to Nash, but that’s rare. Mostly, I only need to worry, at most, if my actions are effectively doing something simple and easy to model.

Playing the same exact scenarios, or with the same exact people, or both, for long enough, both increases the amount of data available for analysis, and reduces the randomness behind it. Eventually, such tactics stop working. But it takes a while, and the more you care about long histories in non-obvious ways, the longer it will take.

Rather than be actually random, instead one adjusts when one’s behavior has sufficiently deviated from what would look random, such that others will likely adjust to account for it. That adjustment, too, need not be random.

Rushing into doing things to mix up your play, before others have any data to work with, only leaves value on the table.

One strong strategy when one needs to mix it up is to do what the details favor. Thus, if there’s something you need to occasionally do, and today is an unusually good day for it, or now an especially good time, do it now, and adjust your threshold for that depending on how often you’ve done it recently.

A mistake I often make is to choose actions as if I was assuming others know my decision algorithm and will exploit that to extract all the information. Most of the time this is silly.

This brings us to the issue of Glomarization.

Glomarization

Are you harboring any criminals? Did you rob a bank? Is there a tap on my phone? Does this make me look fat?

If when the answer is no I would tell you no, then refusing to answer is the same as saying yes. So if you want to avoid lying, and want to keep secrets, you need to sometimes refuse to answer questions, to avoid making refusing to answer too meaningful an action. Eliezer discussed such issues recently.

This section was the original motivation for writing the poker series up now, but having written it, I think a full treatment should mostly just be its own thing. And I’m not happy with my ability to explain these concepts concisely. But a few thoughts here.

The advantage of fully explicit meta-honesty, telling people exactly under what conditions you would lie or refuse to share information, is that it protects a system of full, reliable honesty.

The problem with fully explicit meta-honesty is that it vastly expands the necessary amount of Glomarization to say exactly when you would use it. 

Eliezer correctly points out that if the Feds ask you where you were last night, your answer of ‘I can neither confirm or deny where I was last night’ is going to sound mighty suspicious regardless of how often you answer that way. Saying ‘none of your goddamn business’ is only marginally better. Also, letting them know that you always refuse to answer that question might not be the best way to make them think you’re less suspicious.

This means both that full Glomarization isn’t practical unless (this actually does come up) your response to a question can reliably be ‘that’s a trap!’.

However, partial Glomarization is fine. As long as you mix in some refusing to answer when the answer wouldn’t hurt you, people don’t know much. Most importantly, they don’t know how often you’d refuse to answer. 

If the last five times you’ve refused to answer if there was a dragon in your garage, there was a dragon in your garage, your refusal to answer is rather strong evidence there’s a dragon in your garage.

If it only happened one of the last five times, then there’s certainly a Bayesian update one can make, but you don’t know how often there’s a Glamorization there, so it’s hard to know how much to update on that. The key question is, what’s the threshold where they feel the need to look in your garage? Can you muddy the waters enough to avoid that?

Once you’re doing that, it is almost certainly fine to answer ‘no’ when it especially matters that they know there isn’t a dragon there, because they don’t know when it’s important, or what rule you’re following. If you went and told them exactly when you answer the question, it would be bad. But if they’re not sure, it’s fine.

One can complement that by understanding how conversations and topics develop, and not set yourself up for questions you don’t want to answer. If you have a dragon in your garage and don’t want to lie about it or reveal that it’s there, it’s a really bad idea to talk about the idea of dragons in garages. Someone is going to ask. So when your refusal to answer would be suspicious, especially when it would be a potential sign of a heretical belief, the best strategy is to not get into position to get asked.

Which in turn, means avoiding perfectly harmless things gently, invisibly, without saying that this is what you’re doing. Posts that don’t get written, statements not made, rather than questions not answered. As a new practitioner of such arts, hard and fast rules are good. As an expert, they only serve to give the game away. ‘

Remember the illusion of transparency. Your counterfactual selves would need to act differently. But if no one knows that, it’s not a problem.

 

 

 

 

 

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to Simplified Poker Conclusions

  1. I’m curious: if your program lost to Phillips’s head to head, how did you beat him overall?

  2. Pingback: Rational Feed – deluks917

  3. Do you think Eliezer is wisely avoiding any statement about dragons in garages, while other practicioners are dancing too close to that edge?

  4. Peter Gerdes says:

    I’ve always been puzzled about the point of this whole debate. I mean as you point out it only matters if people actually believe you will behave in these counterfactual ways and since there is simply no possible way for you to convince anyone that in the super unlikely Gestapo officer whose read Yudkowsky’s glomarization article you wouldn’t lie there is absolutely no benefit to actually adopting such a meta-honesty policy.

    But even setting this aside the whole framework seems to utterly fall apart once we consider agents who aren’t logically omniscient. Quite simply, you don’t actually have the computational power to play out every aspect of every scenario in which you might not want an adversary to uncover some truth about you. However, the moment there is some specific possibility on the table that your adversary wishes to evaluate (is he hiding a dragon) they can spend a great deal of time thinking through all the practical upshots of dragon hiding (ok he’d need to get food from somewhere and he’d need some cover story. He’d need some way to hide the smell etc.. etc..) and work out what kinds of things someone hiding a dragon would need to hide or be inclined not to answer.

    Alright, so they now just question you (even at merely a meta-level about what you would lie about) and see if your responses look like those that would be given by someone who has very thoroughly and completely thought through all the practicalities of covering up their dragon possession. For instance, ask a bunch of “would you lie about X if you were hiding a dragon and questioned by someone suspicious” and after “how often you clean your house” toss in “where you do your food shopping.” Someone hiding a dragon will realize they can’t risk revealing the hog farm they’ve been buying dragon chow from while an innocent individual will probably not have thought through the intricacies of dragon care.

    This might seem like a variant of the “what about the Gestapo officer who has read this article” objection but it’s not actually uncommon at all. This is what the police do in every interrogation. Sure, they aren’t expecting a meta-honest suspect or used to dealing with glomerization but they certainly are on the look out for answers which suggest an unlikely degree of knowledge/anticipation for an innocent individual and they would certainly talk to your friends to establish your usual habits (e.g. meta-honesty and glomarization) and look for salient deviations from them.

    • Peter Gerdes says:

      Hmm, perhaps a simpler way of making a related point is this.

      If you actually follow the advice about glomarization it is no longer improbable that you will be interrogated by someone who has read the rationalist literature on the subject and thought through the consequences. Investigators do their homework and being committed enough to glomarize frequently enough to do the intended work is a feature that will stick out like a sore thumb when your associates are interviews and immediately send the investigator out to read the literature.

      Now maybe most investigators aren’t anywhere near this through but if you are facing an investigator who doesn’t even bother looking into your normal behavior your glomarization is irrelevant anyway.

    • TheZvi says:

      As I see it there are a few points:

      1. You interact a lot with the same people over and over again, especially your friend group, coworkers, family and so forth. It is a huge advantage to be able to trust that such people are telling the truth, without it making secrets impossible to keep. Doing meta-honesty right, or at least doing a decent job of it, is a huge practical advantage in these interactions.

      2. Reputations are totally a thing. As I point out, it’s not key that you actually do a game-theoretically-sound defense as such, but you want to be able to have a reputation for honesty and one for keeping secrets, and to be able to maintain both, and that’s a hard puzzle, both in theory and in practice.

      3. It’s an important AI problem slash problem for systems. Systems and AIs are often presumed to be either open source, or can be experimented on or simulated sufficiently that they might as well be in context. If such questions are key future design questions, you need to figure out how they can interact with other agents. And like most such problems, solving the problem for humans is a good way to understand the problem while getting practical use, as well.

      4. It’s an interesting problem. It’s good policy to study interesting problems even if you don’t know what they’re for.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s