General Poker and the Prisoner's Dilemma

Chris Taylor

Contributor

So @Jason Waddell I'm sure you've seen this already, but we've probably got a non-zero amount of past or current poker players on here, so I figure it's worth the echo.

Jorbs is an excellent streamer, mainly does Slay the Spire but does branch out into other strategy games on occasion, and occasionally stuff like this as well.

Much of the poker terminology went over my head, but the familiar "If my both me and my opponent do what we're supposed to here, we can both benefit, but if one of us tries to go for extra value it can all fall apart." reminds me of a lot of situations, not just in magic.
 

Jason Waddell

Administrator
Staff member

So @Jason Waddell I'm sure you've seen this already, but we've probably got a non-zero amount of past or current poker players on here, so I figure it's worth the echo.

Jorbs is an excellent streamer, mainly does Slay the Spire but does branch out into other strategy games on occasion, and occasionally stuff like this as well.

Much of the poker terminology went over my head, but the familiar "If my both me and my opponent do what we're supposed to here, we can both benefit, but if one of us tries to go for extra value it can all fall apart." reminds me of a lot of situations, not just in magic.
I saw this in the VODs, didn't realize it had made its way to YouTube. I paid my bills in college via single table tournaments, and while I agree with Jorbs here in a way, I don't really think his disregard of the ICM model was necessarily correct.

However, sometimes with probability type things he ends up with the right answer through a very different and more convoluted approach. Recently they were talking about the probability of getting Cursed Key from the Act 1 boss. We both ended up with the same probability, but he and I took very different routes to get to the answer.

The point of that being that I think we have very different frameworks for thinking about certain problems. When I played tournament poker, during these endgame situations, I tried to think about fear and player psychology a lot. It was very clear that certain players were more afraid than others to push their chips in. But at the ends of tournaments, the blinds will simply swallow you if you don't act. This was very relevant as there was a hard cutoff for payouts (top 3 got paid varying amounts, others got nothing).

I just woke up, so I don't really have a proper analysis here. My experience was of single-table tournaments where the big stacks were almost working cooperatively to eliminate the small stacks until they got past the bubble. To me I found it most useful to observe the social dynamics and levels of perceived fear when making decisions, but typically we were 1) playing with much worse players than Jorbs probably has in mind and 2) much more shortstacked than 15BB.
 
The prisoners dilemma is as follows:
2 prisoners have the option to stay silent or to be a rat. If both stay silent each will be jailed for 3 years. If one of them rats, that prisoner gets a reduced sentence of only 1 year. The silent one gets 10 years. If both of them rat then both get 5 years.
In the prisoners dilemma it is always the best to be a rat while the best outcome is to both stay silent. However, if you stay silent the other one is better off to rat. It is a beautiful example of showing that what is best for all parties may not be the best for one. Read: it is best for you if the rest plays fair and you cheat, just like real life. However, the rest has this same problem and hence will not play fair and this results in that everyone is actually worse off. Sounds familiar?

Now that is out of the way, it is seldom right to compare something to the prisoners dilemma.
Why do you ask? Well, because there is only one best strategy! (Unless you use it to impress someone who does not know that the prisoners dilemma is even more boring than tic-tac-toe. That this is actually a part of the prisoners dilemma is left as an exercise for the reader.)

Most of the time it is better to compare your comparison, e.g., poker to the what type of player are you problem? In this problem the game consists of a number of players with a number of strategies. Strategy 1 is beaten by strategy 2. However, strategy 2 is beaten by strategy 3. Strategy 3 is in turn is beaten by strategy 4 and so on. The beauty is that strategy 1 beats strategy x (e.g., 3), and so on. Now the challenge becomes to find out what type of player the opponent is. Here is where the psychological part of poker comes into play. Reading what type of player the opponent is (and that type can change from hand to hand if the opponent is smart) will win you the game (assuming that everyone has a good memory and probability calculus in their head). The beauty is that, if I remember correctly, is that the best strategy depends on the others instead of just a best strategy no matter what.
 
Last edited:
The beauty is that, if I remember correctly, is that the best strategy depends on the others instead of just a best strategy no matter what.

So, not to game theory nerd too much, but certain situations call for certain strategies. What you're describing is a pure Nash equilibrium, in which a rational player will choose to play a single dominant strategy. However, these situations are special and at the bare minimum rely on perfect transparency. Both players must be aware of the probability of all outcomes and the expected value of each outcome for each player in order to arrive at a true monotonic strategy.

However, a strategy can be more complex than "always push button A." Poker is actually played in reference to a Nash equilibrium of it's own. Players can choose to deviate from the Nash equilibrium to portray different game states based on hidden information (i.e. bluff hidden cards), but the fundamental strategies that they are portraying technically fall under a single dominant strategy. It's just a complex one rather than a simple one.

Please note that I'm not a poker expert and that Jason et al. are, I just studied some game theory in undergrad and wanted to clarify some niceties! And I wanted to suggest that you read the source material if you're interested in the theoretical underpinnings of these ideas because it's actually fairly simple to parse (yes, I know the book looks daunting, but it's actually pretty accessible--much of the bulk is case studies).
 
Last edited:
So, not to game theory nerd too much, but certain situations call for certain strategies. What you're describing is a pure Nash equilibrium, in which a rational player will choose to play a single dominant strategy. However, these situations are special and at the bare minimum rely on perfect transparency. Both players must be aware of the probability of all outcomes and the expected value of each outcome for each player in order to arrive at a true monotonic strategy.

However, a strategy can be more complex than "always push button A." Poker is actually played in reference to a Nash equilibrium of it's own. Players can choose to deviate from the Nash equilibrium to portray different game states based on hidden information (i.e. bluff hidden cards), but the fundamental strategies that they are portraying technically fall under a single dominant strategy. It's just a complex one rather than a simple one.

Please note that I'm not a poker expert and that Jason et al. are, I just studied some game theory in undergrad and wanted to clarify some niceties! And I wanted to suggest that you read the source material if you're interested in the theoretical underpinnings of these ideas because it's actually fairly simple to parse (yes, I know the book looks daunting, but it's actually pretty accessible--much of the bulk is case studies).
I also studied game theory (but a bit longer than just ago) and my point was that poker is not comparable to a nash-equilibrium, and hence not to the prisoners dilemma. Maybe a simplified version of poker, but not the true poker. That is as you correctly point out due to the hidden information, one can bluff. If it had a true nash equilibrium the game would be easy and boring.
I wanted to avoid the technicalities in my explanation.

However, as game theory nerds: Nash equilibria are based on the fact that all players play for their own best outcome and know that the other player plays for their best outcome and knows that the others do as well. Calculating a Nash equilibrium is easy, it is a strategy that makes the other players indifferent. All games have at least one Nash equilibrium. It becomes more interesting, and it shows why Nash equilibria is a flawed concept to start with, is when there are multiple Nash equilibria. Applying the same theory/calculation to select which equilibrium is best for a certain player typically results in not an equilibrium as outcome.
 
Wait, people post here who aren't game theory nerds? :p

Instead of talking about the actual topic of this thread from 2013, I'm going to nerd out about the actually interesting thing about the prisoner's dilemma (or any coordination game, really, but I don't know if anyone's done with with hawk-and-dove).

Namely, if you play Prisoner's Dilemma once, there's a clear dominant strategy (always defect). However, if you play a Prisoner's Dilemma tournament, where you play repeated rounds of the game against other players with some amount of memory, that strategy becomes absolute dogshit. If you allow competitors to remember even a single previous game, there are strategies that will soundly beat the defector's final score almost every time.

And that setting, where your strategy is dependent on figuring out what your opponent is doing, is where it becomes an actual game (and not just a toy example of one).
 
Wait, people post here who aren't game theory nerds? :p

Instead of talking about the actual topic of this thread from 2013, I'm going to nerd out about the actually interesting thing about the prisoner's dilemma (or any coordination game, really, but I don't know if anyone's done with with hawk-and-dove).

Namely, if you play Prisoner's Dilemma once, there's a clear dominant strategy (always defect). However, if you play a Prisoner's Dilemma tournament, where you play repeated rounds of the game against other players with some amount of memory, that strategy becomes absolute dogshit. If you allow competitors to remember even a single previous game, there are strategies that will soundly beat the defector's final score almost every time.

And that setting, where your strategy is dependent on figuring out what your opponent is doing, is where it becomes an actual game (and not just a toy example of one).
I am sorry, but in the prisoners dilemma, without consultation/side agreements, it is always best to defect. Period. No matter the amount of timeyou play it, it is always better for you to defect.

You are right about that something only becomes interesting when you have to find out what your opponents strategy.

A coordination game is something different than prisoners dillema. In a coordination game it is never better to deviate. And in a coordination game you can go to the best outcome for all.
 
oh, is it real 'talking game theory' hours? fuck yeah...

Iterated prisoner's dilemma is fun but central IMO to any broader discussion of "prisoner's dilemma" is its most compelling real-world example: nuclear war. 'co-operate' and 'defect' are broadly analogous to 'don't use nuclear weapons' and 'do use nuclear weapons', with additional gamestates representing various homeostatic arrangements of nuclear policy. It's all well and good to point out the differences between single and iterated Prisoner's Dilemma but there's very little chance of Iterated Nuclear Missile Launching gameplay (and yes, nuclear war is a game; perhaps the Greatest Game.) Accordingly, much of the research on game theory in the 1950s and early '60s (when John Nash published his research on game theory) was funded by the RAND corporation, the US military-industrial-complex's "R & D" wing (thus the name RAND), specifically to address the non iterated 'outbreak of nuclear war' version of the game.

The political science versions of the game, though, seemed too abstracted to me. Countries are persistent, the rules say, so prisoners can't 'escape', only wither away, fail to co-operate while receiving negative reputation, or engaging in broad international co-operation. The example of North Korea is usually taken as proof of the validity of the model, but I think it disproves it; NK and similar states exist under a sanctions regime that itself constitutes the multilateral 'cooperation' of the winning countries. A great deal of blame, in the IR model, is assigned to states that 'fail to cooperate with international norms', while no blame is assigned to the countries which then exile and immiserate these 'poor sports'.

Rusje seems to be acknowledging the historical highest-performance strategy, something Robert Axelrod, who applied programmatic logic to the Prisoner's Dilemma in the 80s, called "tit for tat", the 'tier zero deck' of Prisoner's Dilemma's 'Standard format'. But tit for tat (also called 'nice until naughty' and a dozen other names; the strategy is to co-operate until you are defected against, at which point you shift to defecting until you are co-operated with) isn't necessarily mathematically the best possible strategy. As the Stanford Encyclopedia of Philosophy writes:
Axelrod attributed the success of TFT to four properties. It is nice, meaning that it is never the first to defect. The eight nice entries in Axelrod's tournament were the eight highest ranking strategies. It is retaliatory, making it difficult for it to be exploited by the rules that were not nice. It is forgiving, in the sense of being willing to cooperate even with those who have defected against it (provided their defection wasn't in the immediately preceding round). An unforgiving rule is incapable of ever getting the reward payoff after its opponent has defected once. And it is clear, presumably making it easier for other strategies to predict its behavior so as to facilitate mutually beneficial interaction.

Suggestive as Axelrod's discussion is, it is worth noting that the ideas are not formulated precisely enough to permit a rigorous demonstration of the supremacy of TFT[...]Evidence has emerged that the striking success of TFT in Axelrod's tournaments may be partly due to features particular to Axelrod's setup. Rapoport et al (2015) suggest that, instead of conducting a round-robin tournament in which every strategy plays every strategy, one might divide the initial population of strategies randomly into equal-size groups, conduct round-robin tournaments within each group, and then a championship round-robin tournament among the group winners. They find that, with the same initial population of strategies present in Axelrod's first tournament, the strategies ranked two and six in that tournament both perform considerably better than top-ranked TFT. Kretz (2011) finds that, in round-robin tournaments among populations of strategies that can only condition on a small number of prior moves (of which TFT is clearly one) relative performance of strategies is sensitive to the payoff values in the PD matrix. (Interestingly, this is so even if the PDs all satisfy or fail to satisfy the condition R+P=T+S, characterizing exchange games, and if they all satisfy or fail to satisfy the RCA condition, R>½(T+S).

Equally telling, perhaps, are the results of a more recent tournament using the same parameters as Axelrod did. To mark the twentieth anniversary of the publication of Axelrod's book, a number of similar tournaments were staged at the IEEE Congress on Evolutionary Computing in Portland in 2004 and the IEEE Symposium on Computational Intelligence and Games in Colchester 2005. Kendall et al 2007 describes the tournaments and contains several papers by authors who submitted winning entries. Most of the tournaments were deliberately designed to differ significantly from Axelrod's (and some of these are briefly discussed in the section on signaling below). In the one that most closely replicated Axelrod's tournaments, however, TFT finished only fourteenth out of the fifty strategies submitted.
[...]
Li (2007) says explicitly that the idea behind APavlov was to make an educated guess about what strategies would be entered, find an accurate, low-cost way to identify each during the initial stages of the game and then play an optimal strategy against each strategy so identified. For example, the strategies Cu, Du, GRIM, RANDOM, TFT, TFTT, TTFT, and P1, described in a supplementary table, had all appeared in previous tournaments. By defecting in round one, cooperating in round three, and choosing the opposite of one's opponent's round-one move in round two, one could identify any opposing strategy from among these nine in three moves. This identification process would be costly, however, because, by its first move, it eliminates any opportunity of cooperation with GRIM. Li chooses instead to employ TFT over the first six rounds as his identifying strategy, reducing cost at the expense of accuracy and range. It is worth noting that TFT cannot distinguish any pair of strategies that satisfy Axelrod's niceness condition (never being the first to defect). This means that it forgoes the chance to exploit unconditional cooperators. Li's entry won its tournament only because he guessed correctly that not many unconditional cooperators would be present.

The Prisoner's Dilemma is far less solved than you think.
 
Last edited:
Nice one!
However, what if an opponent switches strategies?

More on point and as said above by others. Iterative is sometimes not possible.

Thank you all for such a beautiful thread drift into game theory!
 

Onderzeeboot

Ecstatic Orb
I can recommend Jostein Gaarder's Sophie's World if you want an accessible recap of (mostly Western) philosphy. Good book, good pace!
 
I also studied game theory (but a bit longer than just ago) and my point was that poker is not comparable to a nash-equilibrium, and hence not to the prisoners dilemma. Maybe a simplified version of poker, but not the true poker. That is as you correctly point out due to the hidden information, one can bluff. If it had a true nash equilibrium the game would be easy and boring.
I wanted to avoid the technicalities in my explanation.

However, as game theory nerds: Nash equilibria are based on the fact that all players play for their own best outcome and know that the other player plays for their best outcome and knows that the others do as well. Calculating a Nash equilibrium is easy, it is a strategy that makes the other players indifferent. All games have at least one Nash equilibrium. It becomes more interesting, and it shows why Nash equilibria is a flawed concept to start with, is when there are multiple Nash equilibria. Applying the same theory/calculation to select which equilibrium is best for a certain player typically results in not an equilibrium as outcome.
















The term "Prisoner’s Dilemma" holds significance not only in social sciences but also in the realm of business. This strategic concept highlights decision-making scenarios where individuals or companies face choices that impact both their own outcomes and those of others. The complexities of interdependence and cooperation come to the forefront in such situations. Businesses often encounter analogous situations, where collaboration and competition intertwine, influencing outcomes. Considering the implications of the "Prisoner’s Dilemma," companies strive to strike a balance between self-interest and mutual benefit. In this context, the term "bc game" could potentially denote a business scenario involving strategic choices that mirror the concepts inherent in the Prisoner's Dilemma, showcasing the intricate dynamics businesses navigate to achieve favorable results.
poker's a complicated game, so everyone has their own methods
 
oh, is it real 'talking game theory' hours? fuck yeah...

Iterated prisoner's dilemma is fun but central IMO to any broader discussion of "prisoner's dilemma" is its most compelling real-world example: nuclear war. 'co-operate' and 'defect' are broadly analogous to 'don't use nuclear weapons' and 'do use nuclear weapons', with additional gamestates representing various homeostatic arrangements of nuclear policy. It's all well and good to point out the differences between single and iterated Prisoner's Dilemma but there's very little chance of Iterated Nuclear Missile Launching gameplay (and yes, nuclear war is a game; perhaps the Greatest Game.) Accordingly, much of the research on game theory in the 1950s and early '60s (when John Nash published his research on game theory) was funded by the RAND corporation, the US military-industrial-complex's "R & D" wing (thus the name RAND), specifically to address the non iterated 'outbreak of nuclear war' version of the game.

The political science versions of the game, though, seemed too abstracted to me. Countries are persistent, the rules say, so prisoners can't 'escape', only wither away, fail to co-operate while receiving negative reputation, or engaging in broad international co-operation. The example of North Korea is usually taken as proof of the validity of the model, but I think it disproves it; NK and similar states exist under a sanctions regime that itself constitutes the multilateral 'cooperation' of the winning countries. A great deal of blame, in the IR model, is assigned to states that 'fail to cooperate with international norms', while no blame is assigned to the countries which then exile and immiserate these 'poor sports'.

Rusje seems to be acknowledging the historical highest-performance strategy, something Robert Axelrod, who applied programmatic logic to the Prisoner's Dilemma in the 80s, called "tit for tat", the 'tier zero deck' of Prisoner's Dilemma's 'Standard format'. But tit for tat (also called 'nice until naughty' and a dozen other names; the strategy is to co-operate until you are defected against, at which point you shift to defecting until you are co-operated with) isn't necessarily mathematically the best possible strategy. As the Stanford Encyclopedia of Philosophy writes:


The Prisoner's Dilemma is far less solved than you think.
The og, read nuclear war prisoners dilemma is solved. I was always referring to that one, that’s why I explicitly stated the original problem with its rules. By making it iterative you change the rules and hence a different solution is likely better. Giving two different problems the same name is not smart…
 
Top