What question did this study set out to answer?

The aim is to evaluate if large language model agent populations align with predictions made by evolutionary game theory in various games and network structures.

May 30, 2026Open Access

Evolutionary Game Theory Without Evolution: Emergent Equilibria in LLM Agent Populations on Networks

Key Points

The aim is to evaluate if large language model agent populations align with predictions made by evolutionary game theory in various games and network structures.
Tested three 2 × 2 games (Hawk-Dove, Stag Hunt, and pure coordination) across five network topologies.
Conducted ∼37,500 game decisions over 150 independent trials.
Formulated four falsifiable propositions relating to game outcomes and equilibrium frequencies.
Frontier LLM populations approximate the Hawk-Dove ESS at V/C = 2/3 on complete graphs, with equilibrium frequency declining in sparsity or clustering (F(4, 20) = 7.19, p = 0.0009).
Empirical Hawk frequency shows an attenuated response to V/C, with a slope of 0.50 ± 0.04, strongly rejecting the slope-1 null (t(38) = −13.35, p < 10−15).
In Stag Hunt, all runs achieved 100% Stag, and pure coordination led to deterministic selection of the first listed action (29/29 runs).

Abstract

We test whether populations of large language model (LLM) agents reproduce the quantitative predictions of classical evolutionary game theory in three canonical 2 × 2 games Hawk–Dove, Stag Hunt, and a pure coordination game—across five network topologies (complete, Erd˝os–R´enyi, Barab´asi–Albert, Watts–Strogatz, and a 2D lattice). The Hawk–Dove game provides the headline test: classical theory (Maynard Smith, 1982) predicts an evolutionarily stable strategy (ESS) at Hawk frequency x∗H = V/C, a prediction confirmed for biological populations and for reinforcement learning agents on networks but not yet, to our knowledge, for frontier LLMs. We complement this with a Stag Hunt experiment that probes whether the Pareto-selection rule observed in dyadic LLM bargaining (Drakos, 2026) survives at population scale, and with a pure coordination experiment that benchmarks convention emergence. We formulate four falsifiable propositions and report empirical findings from ∼37,500 game decisions across 150 independent runs. Three findings emerge. First, frontier LLM populations approximate the Hawk–Dove ESS at V/C = 2/3 on the complete graph K25 (5-seed mean 0.643, 95% CI 0.612, 0.674 containing the theoretical 0.667) but the equilibrium frequency depreciates systematically as network sparseness or clustering increases(one-way ANOVA across the five topologies: F(4, 20) = 7.19, p = 0.0009). Second, a V/C-parameter sweep across five additional ratios reveals an attenuated parametric response: the empirical Hawk frequency tracks the theoretical V/C with slope 0.50 ± 0.04 (95% CI 0.43, 0.58; R2 = 0.83), strongly rejecting the slope-1 null (t(38) = −13.35, p < 10−15). The LLM population compresses toward 0.5 action balance regardless of payoff parameters, over-Hawking below V/C = 0.5 and under-Hawking above. Third, in Stag Hunt all twenty-five runs reach 100% Stag (perfect Pareto-selection), and in pure coordination the population deterministically selects whichever action is listed first in the prompt enumeration (29/29 runs).

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper