A Nash equilibrium is a strategy profile where no player can improve their payoff by changing only their own strategy.
In other words, each player’s strategy is a best response to the strategies chosen by everyone else.
Core Idea
A Nash equilibrium is stable against unilateral deviation:
Unilateralmeans one player changes while everyone else keeps their strategy fixed.Deviationmeans switching to a different strategy.
If changing alone cannot help any player, the strategy profile is a Nash equilibrium.
This does not mean the outcome is the best possible outcome for everyone. It only means no single player can do better by switching alone.
Formal Definition
Suppose there are players
Let:
be player ’s strategy be the strategies of every player except be player ’s payoff
A strategy profile
Meaning: once everyone else is playing their equilibrium strategy, player
Key Points
- Each player’s strategy is a best response to the strategies of the other players.
- No player has an incentive to change their strategy alone.
- A Nash equilibrium can be bad for all players.
- A game can have one Nash equilibrium, many Nash equilibria, or no pure-strategy Nash equilibrium.
- Mixed strategies can create equilibria when pure strategies do not.
Prisoner’s Dilemma Example
This is the classic prisoner’s dilemma payoff matrix. Each cell is written as:
Based on the payoffs:
- if Player B cooperates, Player A’s best response is to defect (get 5 instead of 3)
- if Player B defects, Player A’s best response is to defect (get 1 instead of 0)
- if Player A cooperates, Player B’s best response is to defect (get 5 instead of 3)
- if Player A defects, Player B’s best response is to defect (get 1 instead of 0)
So (Defect, Defect) is the Nash equilibrium.
Even though (Cooperate, Cooperate) gives both players a higher payoff than (Defect, Defect), it is not a Nash equilibrium. If A cooperates, B can improve from 3 to 5 by defecting. If B cooperates, A can improve from 3 to 5 by defecting.
The important distinction:
(Cooperate, Cooperate)is socially better.(Defect, Defect)is strategically stable.
Evolution of Trust Example
The Evolution of Trust by Nicky Case is a great interactive example for understanding why Nash equilibrium matters.
The game is based on the prisoner’s dilemma, but it extends the idea by making players interact repeatedly. This changes the incentives:
- In a one-shot prisoner’s dilemma, defecting is the dominant strategy, so
(Defect, Defect)is the Nash equilibrium. - In repeated interactions, cooperation can become rational because today’s defection can be punished in future rounds.
- Strategies like
tit for tatcan work well because they start by cooperating, reward cooperation, and punish defection.
This shows the difference between a single-game equilibrium and repeated-game behavior.
In the one-shot version, trust breaks down because each player has an incentive to defect.
In the repeated version, trust can emerge because the future changes the payoff calculation. If I know I will interact with you again, betraying you now may cost me later.
The main lesson:
Cooperation is easier to sustain when players expect repeated interactions, can remember past behavior, and can punish betrayal.
This does not mean always cooperate is automatically a Nash equilibrium. It means the structure of the game can make cooperative strategies stable when players care enough about future payoffs.
Pure vs Mixed Strategy
Pure strategy - A player chooses one action with probability 1.
Example: always defect.
Mixed strategy - A player randomizes between actions with certain probabilities.
Example: in rock paper scissors, playing rock, paper, and scissors each with probability
In a mixed-strategy Nash equilibrium, each player randomizes in a way that makes the other player indifferent between their available actions.
How to Find a Nash Equilibrium in a Payoff Matrix
- For each possible strategy of Player B, mark Player A’s best response.
- For each possible strategy of Player A, mark Player B’s best response.
- Any cell where both players are playing best responses is a Nash equilibrium.
Common Misconceptions
Nash equilibrium means everyone is happy - False. A Nash equilibrium can be inefficient or bad for everyone.
Nash equilibrium means the best total outcome - False. It is about individual incentives, not total welfare.
Nash equilibrium means no one can improve at all - False. Players might improve if they coordinate and change together. Nash equilibrium only blocks profitable one-player deviations.
Nash equilibrium means the strategy cannot be exploited - Not always. In two-player zero-sum games, equilibrium strategies are unexploitable in a precise sense. In general-sum games, the idea is broader: each player is best responding to the others.
Connection to Poker and CFR
In imperfect-information games like poker, a Nash equilibrium is useful because it gives a strategy that cannot be profitably exploited by an opponent in the long run.
Algorithms like Counterfactual Regret Minimization try to approximate Nash equilibrium strategies through repeated self-play and regret minimization.
This is why an equilibrium strategy is often called a defensive strategy: it protects against exploitation, but it may not make the most money against weak opponents who make predictable mistakes.