What type of study is this?

This is a Qualitative Study study.

What question did this study set out to answer?

This research aims to understand how negotiation protocols affect equilibrium outcomes in LLM coordination games.

May 6, 2026Open Access

The Reasoning Paradox in LLM Coordination Games: A Quantal-Response Mechanism Design Perspective with an Empirical Falsification and the Identification of Pareto-Selection

Key Points

This research aims to understand how negotiation protocols affect equilibrium outcomes in LLM coordination games.
Modeled LLM agents as logit quantal responders with varying rationality parameters.
Conducted empirical tests across 480 LLM-mediated dialogues using various protocols.
Analyzed outcomes based on Nash equilibria selection.
Structured protocols did not improve collective welfare as hypothesized.
The Pareto-dominant action profile was selected in 100% of trials.
Qualitative analysis showed correct identification of Nash equilibria in dialogues.

Abstract

We study how the negotiation protocol governing two large-language-model (LLM) agents shapes which Nash equilibrium they realise in coordination games with multiple equilibria. We model each LLM agent as a logit quantal responder whose effective rationality parameter λP is a property of the protocol P, not of the agent in isolation: structured protocols that elicit explicit chain-of-thought reasoning are hypothesised to induce higher λP than single-shot simultaneous moves. Within this framework we prove a comparative-statics result: along the principal branch of the symmetric quantal response equilibrium, increasing λ drives the realised play toward the risk-dominant pure equilibrium in the Harsanyi–Selten basin sense, regardless of payoff dominance. This yields a counterintuitive consequence we call the reasoning paradox: in coordination games where the risk-dominant equilibrium is payoff-dominated, structured protocols would predictably reduce collective welfare relative to less structured protocols. We pre-registered this prediction and ran an empirical test across 480 LLM-mediated dialogues spanning two coordination games (one aligned, one misaligned), three protocols (simultaneous, alternating offers, structured-with-reasoning), and two frontier Claude models (Sonnet 4. 6, Opus 4. 7). The data decisively reject the prediction: the Pareto-dominant action profile is selected in 100% of trials, with zero variance and instant first-round acceptance in every negotiation-protocol run. The principal-branch QRE-as-protocol-regulator hypothesis is therefore falsified at total-variation distance 0. 75 in the misaligned game; the operative equilibrium-selection rule for current frontier Claude models is deterministic Pareto-efficiency selection, consistent with the secondary-branch multiplicity result (Theorem 3) but not with the principal-branch limit (Theorem 2). Qualitative analysis of the structured-reasoning transcripts confirms this is not a confusion artifact: in 100% of dialogues both Nash equilibria are correctly identified, and (B, B) is explicitly considered and rejected on Pareto-efficiency grounds in 70% of misaligned-game cases. We translate the empirical guidance into four reusable code patterns (single-shot parallel, isolated parallel, asymmetric framing, decision helper), bundled as a runnable Python module released alongside the paper, with a canonical demo achieving coordination between two LLM agents in two API calls and ~1. 7 seconds, saving roughly 70% of tokens versus the multi-round structured-reasoning baseline.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper