Recent news stories, picked up all over blogland, reported that Tit-for-Tat has been dethroned as the best strategy in iterated prisoners’ dilemma games. In a computer tournament, a team from Southampton University won with a new strategy, beating the Tit-for-Tat strategy for the first time.
Here’s the background. Prisoners’ Dilemma is a game with two players. Each player chooses a move, which is either Cooperate or Defect. Then the players reveal their moves to each other. If both sides Cooperate, they each get three points. If both Defect, they each get one point. If one player Cooperates and the other Defects, then the defector gets five points and the cooperator gets none. The game is interesting because no matter what one’s opponent does, one is better off chosing to Defect; but the most mutually beneficial result occurs when both players Cooperate.
Things get more interesting when you iterate the game, so that the same pair of players plays many times in a row. A player can then base its strategy on what the opponent has done recently, which changes the opponent’s incentives in an subtle ways. This game is an interesting abstract model of adversarial social relationships, so people are interested in understanding its strategy tradeoffs.
For at least twenty years, the best-looking strategy has been Tit-for-Tat, in which one starts out by Cooperating and then copies whatever action the opponent used last. This strategy offers an appealing combination of initial friendliness with measured retaliation for an opponent’s Defections. In tournaments among computer players, Tit-for-Tat won consistently.
But this year, the Southampton team unveiled a new strategy that won the latest tournament. Many commentators responded by declaring that Tit-for-Tat had been dethroned. But I think that conclusion is wrong, for reasons I’ll explain.
But first, let me explain the new Southampton strategy. (This is based on press accounts, but I’m confident that it’s at least pretty close to correct.) They entered many players in the tournament. Their players divide into two groups, which I’ll call Stars and Stooges. The Stars try to win the tournament, and the Stooges sacrifice themselves so the Stars can win. When facing a new opponent, one of these players starts out by making a distinctive sequence of moves. Southampton’s players watch for this distinctive sequence, which allows them to tell whether their opponents are other Southampton players. When two Southampton players are playing each other, they collude to maximize their scores (or at least the score of the Star(s), if any, among them). When a Star plays an outsider, it tries to score points normally; but when a Stooge plays an outsider, it always Defects, to minimize the opponent’s score. Thus the Stooges sacrifice themselves so that the Stars can win. And indeed, the final results show a few Stars at the top of the standings (above Tit-for-Tat players) and a group of Stooges near the bottom.
If we look more closely, the Southampton strategy doesn’t look so good. Apparently, Tit-for-Tat still scores higher than the average Southampton player – the sacrifice (in points) made by the Stooges is not fully recouped by the Stars. So Tit-for-Tat will still be the best strategy, both for a lone player, and for a team of players, assuming the goal is to maximize the sum of the team members’ scores. (Note that a team of Tit-for-Tat players doesn’t need to use the Southampton trick for recognizing fellow team members, since Tit-for-Tat players who play each other will always cooperate, which is the team-optimal thing to do.)
So it seems that all the Southampton folks discovered is a clever way to exploit the rules of this particular tournament, with its winner-take-all structure. That’s clever, but I don’t think it has much theoretical significance.
UPDATE (Friday 22 October): The comments on this post are particularly good.
Trumping Tit-for-Tat
I can hardly be considered a student of William Riker (no dangit, not that William Riker), but his course on Strategy in Politics was one that I remember vividly and actually occasionally refer back to when I get into arguments…
Wouldn’t the Southampton strategy actually do better, on average, than Tit-for-Tat in a population where people following that strategy dominated?
What I’m getting at is, in an evolutionary situation where most of the reward falls disproportionately on people at the top, a team that followed the Southampton strategy would ‘win’ consistently, eventually dominating the population, even if initially their average scores were lower.
This reminds me of the advantages enjoyed by early farming societies, as described by Jared Diamond in Guns, Germs, and Steel. Sure, the farmers endured low status, a tough, repetetive lifestyle, and a protein-poor diet, but they were able to field a few professional soldiers. These soldiers could then dominate nearby tribes of hunter-gatherers and treat them in any way their society saw fit. The benefits went mostly to the soldiers, but at least the farmers were less likely to be crushed by some OTHER army…
Humans have long defeated tit-for-tat. I wonder how the iterated prisoner’s dilemma could be modified to allow these bots to leverage their advantage.
While I was away, you all seem to have homed in on the key issue. The Stars beat Tit-for-Tat but the Southampton team did worse on average than Tit-for-Tat. So notions of collective good (or payoffs on the side) cannot by themselves make the strategy a good idea. Only if we introduce a winner-take-all rule, or some other kind of nonlinear payoff scheme, for the tournament as a whole, does the strategy become attractive.
There’s another aspect to this whole story, which is that part of the Stooges’ strategy relied on dragging down the scores of nonmembers by consistently Defecting. To the extent that this trick worked, it did not increase the absolute score of the Southampton team but merely dragged down the scores of others. To put it another way, if your utility depends only on the scores of your own team members (and not on the scores of others), then this part of the strategy did not benefit the Southampton team at all in terms of total utility, even if it did move them up in the standings.
What would really impress me is a teamwork-based strategy such that the average scores of the team members is higher than it would have been they all used Tit-for-Tat.
The fact that the Southampton gang’s overall score is lower than would be the score of the same size gang using tit-for-tat, taken along with comments about stooges, kickbacks and cannon fodder, suggests that we implicitly think of such iterated games as having nonlinear payoffs for the highest score. For elections and wars the nonlinearity is obvious.
For economic and economic/political analogs of PD, things get more complicate. You have increasing returns in some cases, decreasing returns in others. But since scores (measured in wealth) don’t bear a linear or even consistent relationship to utility (measured in hard-to-quantify terms of what different people decide to value most) you can get situations where the stooges (without tangible kickbacks) see their utility as positive just as the stars do.
I can think of many situations in real life where some members of a group sacrifice their wellbeing for the advantage of others. Cannon fodder comes to mind. Sometimes abstract principles are used to pursuade soldiers their sacrifice is worthwhile, even though the primary benefit doesn’t accrue to them or their immediate kin.
Another case is the political hatchet man. One person spews negative propaganda against a political opponent, and himself becomes disliked, while the allied candidate remains above the fray.
Sometimes, in real life, such collusion involves kickbacks — the stars would secretly give some of their points to the stooges.
Many evolutionary strategies involve an individual sacrificing so that closely related individuals can reproduce more, thereby spreading some of his own genes indirectly.
Wasn’t tit for tat somewhat dethroned back in 1987, when Axelrod began experimenting with genetic algorithms? From page 9 of Axelrod’s paper:
The Southampton is innovative, to say the least, but I wonder how it would do against an evolved solution, which may also be capable of learning how to exploit the sacrificial competitors.
Great post… The relationship between the environment and the strategy is fascinating to me. Isn’t any strategy is best in one environment and worst in another. I’m sure you could design a game where TFT is the *worst* strategy possible. The trick is knowing what environment you’re in.
I’m not a game theorist; I’m just a poker player who knows a little game theory.
Is Stars and Stooges a strategy for Prisoner’s Dilemma, or is it a strategy for iterated-games tournaments?
It would be interesting to see how the strategy would fare with a different game, say, Roshambo.
It may be the case that Stars and Stooges’ success says little about the Prisoner’s Dilemma, and enormous amounts about collusion in tournaments.
Very interesting Adam…
I personally believe that many (if not most) political debates boil down to “what are we trying to optimize here?” (e.g. is our overall goal to increase success of the most-successful individual, average success of all individuals, median success of all individuals, success of the least-successful individual, etc…). In this light a multi-player entry in a Prisoner’s Dilemma tournament serves as a model for a society whose goal is to produce a given outcome (in this case highest possible success of most-successful individual) through the emergent behavior of all entrants. If the winning conditions were changed, a strategy would be required to maximize score.
I think that the theoretical significance is that these sorts of strategies may start to shed more light on a variety of “irrational” actions, like saving a stranger from an onrushing train, or volunteering for a special forces unit in the army. Yes, you’re taking risks, but there’s a benefit to a group. How much of a risk is worth what payoffs? Does the group as a whole do better?
I’d say it has a very large theoretical significance, inasmuch as the Tit-For-Tat result has any theoretical significance.
Why is Stars/Stooges an artifact (my term), but Tit-For-Tat not an artifact?
That is, the deep significance of the result is to cast light on how a complex optimal strategy can evolve even from rules which seem to favor other strategies. Tit-for-Tat exists in a ruleset which has features which make it a very bad model for real-life interaction, given the simple nature of the exchanges – indeed, much of real-life politics (e.g. “divide and conquer”) can be viewed as successful complexity over the minimal rules.
Thus, showing how adding just a little *internal* complexity (i.e., not changing the payoffs, but organizing a gang of sorts), can win even here – that’s *profound*.
I don’t mean the following in a nasty way, but what you said strikes me as the classic *negative* security reaction when someone finds a clever exploit in a well-known system – i.e. that’s cheating, it’s out-of-bandwidth, doesn’t really affect the system, etc. etc.
What is the “theoretical significance” of these PD games? The significance, if any, is surely that they’re simple models of real-life situations. In that context, Southampton is significant not in that it “dethrones TFT”, but that it adds another dimension to the model: what is the goal? Winner-takes-all situations do occur sometimes in real life, after all.
Great if you have a bunch of people you don’t mind throwing to the wolves, and you can convince them to do it themselves.