Prisoner's Dilemma & Nash Equilibrium
Why two rational actors can each do worse than they jointly could — and when repetition restores cooperation.
Built and reviewed by Stephen Omukoko Okoth
Mathematical Economist · ex-Morgan Stanley FI · Equilar
Theory
What the model says, and why
Two suspects are arrested. Police separate them and offer each the same deal: confess against your partner and walk free; if both confess, both serve a moderate sentence; if neither confesses, both walk on a lesser charge. The dominant strategy for each player, considered alone, is to confess (defect). But if both confess, both end up worse than if both had stayed silent. That’s the dilemma.
The payoffs satisfy:
T = Temptation (defect when partner cooperates), R = Reward (mutual cooperate), P = Punishment (mutual defect), S = Sucker (cooperate when partner defects). The first inequality makes defection dominant; the second makes mutual cooperation Pareto-optimal.
Nash equilibrium in the one-shot game is (Defect, Defect). Each player’s best response to defection is defection. The Pareto-optimal outcome (Cooperate, Cooperate) is unstable — given that the other plays Cooperate, you do better by deviating to Defect.
The repeated game changes everything. If players meet repeatedly with high enough probability of continuation (discount factor δ close to 1), cooperation can be sustained as a Nash equilibrium. The tit-for-tat strategy — cooperate first, then mirror the opponent — wins Axelrod’s famous tournament (1980) and shows up in nature (vampire-bat blood sharing, cleaner fish, US Civil War trench warfare).
Why this matters beyond game theory. The Prisoner’s Dilemma is the cleanest model of situations where individual rationality doesn’t produce collective rationality. Examples: arms races, tax evasion, climate change, oil cartels, advertising wars, fisheries. The structural intuition — that repeated interaction with credible punishment can rescue cooperation — is one of the most important ideas in social science.
Interactive playground
Move the parameters, watch the equilibrium move
Payoffs
Set the four numbers
Status
These payoffs satisfy the Prisoner's Dilemma conditions: T > R > P > S and 2R > T + S.
Payoff matrix
Two players, two strategies
| Player B | ||
|---|---|---|
| Player A | Cooperate | Defect |
| Cooperate | (3, 3) | (0, 5) |
| Defect | (5, 0) | (1, 1) |
Each cell is (A’s payoff, B’s payoff). The Nash equilibrium is bottom-right, even though top-left is Pareto-superior.
Equilibrium analysis
One-shot game
Nash equilibrium
(Defect, Defect)
Pareto-optimal
(Cooperate, Cooperate)
Dilemma size (R − P)
2.00
How much both players lose by defecting
Repeated game
When repetition supports cooperation
Cooperate forever (vs TFT)
38.49
Defect once (vs TFT)
16.83
Defect always (vs TFT)
12.83
Pays only P forever
Cooperation is stable: the payoff from cooperating forever exceeds the gain from a one-time defection followed by punishment.
In the classroom
How to teach it well
Run it as a game first. Before any theory, have students play the one-shot dilemma in pairs for ten minutes. Many cooperate the first time; almost all defect by the third round. The defection trajectory itself is a teaching moment.
Connect to real cases. Cartels (OPEC), arms races (Cold War), tax compliance, environmental treaties, advertising wars between Coke and Pepsi. Each one is a Prisoner’s Dilemma with a repeated-game escape mechanism (institutions, monitoring, credible punishment). The Cold War's MAD doctrine works exactly because repeated interaction with credible punishment makes mutual restraint stable.
Common student trap. Many believe the “rational” outcome should be cooperation because it’s Pareto-better. Push back: rational means best-responding to the opponent’s strategy taking your own as given. From within that frame, defection is rational. The dilemma isn’t about irrationality — it’s about the gap between individual and collective rationality, which is exactly why institutions exist.
African policy applications. Tax evasion in fragmented sectors, coordination failures in regional infrastructure, deforestation as a multi-player dilemma. The same structural insight — that repeated interaction with credible monitoring rescues cooperation — applies to public-finance design.