We study how the negotiation protocol governing two large-language-model (LLM) agents shapes which Nash equilibrium they realise in coordination games with multiple equilibria. We model each LLM agent as a logit quantal responder whose effective rationality parameter λP is a property of the protocol P, not of the agent in isolation: structured protocols that elicit explicit chain-of-thought reasoning are hypothesised to induce higher λP than single-shot simultaneous moves. Within this framework we prove a comparative-statics result: along the principal branch of the symmetric quantal response equilibrium, increasing λ drives the realised play toward the risk-dominant pure equilibrium in the Harsanyi–Selten basin sense, regardless of payoff dominance. This yields a counterintuitive consequence we call the reasoning paradox: in coordination games where the risk-dominant equilibrium is payoff-dominated, structured protocols would predictably reduce collective welfare relative to less structured protocols. We pre-registered this prediction and ran an empirical test across 480 LLM-mediated dialogues spanning two coordination games (one aligned, one misaligned), three protocols (simultaneous, alternating offers, structured-with-reasoning), and two frontier Claude models (Sonnet 4. 6, Opus 4. 7). The data decisively reject the prediction: the Pareto-dominant action profile is selected in 100% of trials, with zero variance and instant first-round acceptance in every negotiation-protocol run. The principal-branch QRE-as-protocol-regulator hypothesis is therefore falsified at total-variation distance 0. 75 in the misaligned game; the operative equilibrium-selection rule for current frontier Claude models is deterministic Pareto-efficiency selection, consistent with the secondary-branch multiplicity result (Theorem 3) but not with the principal-branch limit (Theorem 2). Qualitative analysis of the structured-reasoning transcripts confirms this is not a confusion artifact: in 100% of dialogues both Nash equilibria are correctly identified, and (B, B) is explicitly considered and rejected on Pareto-efficiency grounds in 70% of misaligned-game cases. We translate the empirical guidance into four reusable code patterns (single-shot parallel, isolated parallel, asymmetric framing, decision helper), bundled as a runnable Python module released alongside the paper, with a canonical demo achieving coordination between two LLM agents in two API calls and ~1. 7 seconds, saving roughly 70% of tokens versus the multi-round structured-reasoning baseline.
Stefanos Drakos (Mon,) studied this question.