What type of study is this?

August 19, 2025Open Access

Can We Trust the Machine? LLMs Mimic Human Expected Utility Theory Violations and Its Impact on Decision and Negotiation Systems

TPTeemu Alexander PuutioHarvard University Press MDMatthew DoHarvard University Press

Key Points

Average violation rates in decision-making by LLMs mirror human studies, reaching as high as 36% for independence.
Larger LLMs demonstrated human-like irrational patterns revealing flaws in their decision-support capabilities.
Evaluation tested four LLMs using classic behavioral tasks, including Allais-type lotteries and hyperbolic discounting.
Proposed safeguards include the use of EUT-constrained reasoning to enhance AI decision reliability in critical applications.

Abstract

Abstract Expected Utility Theory (EUT) has long served as a benchmark for rational decision-making, with well-documented human deviations in the form of framing effects, time inconsistency, and violations of independence and sequential rationality. In this study, we extend the EUT audit framework to large language models (LLMs), evaluating whether these increasingly embedded systems behave as rational decision-support agents.Using a four-part audit battery adapted from classic behavioral economics experiments, we tested four popular LLMs—GPT-4o, GPT-3.5, GPT-Mini, and DeepSeek, on 1,600 completions. Each model was evaluated on independence (via Allais-type lotteries), time consistency (via hyperbolic discounting tasks), framing invariance (via gain/loss presentations), and sequential rationality (via last-round cooperation in a Prisoner’s Dilemma).Across models, we observed strikingly human-like patterns of irrationality. Average violation rates were 36% for independence, 34% for time inconsistency, 33% for framing effects, and 32% for sequential rationality, closely mirroring human laboratory data. The consistency of these effects across architectures suggests that bounded rationality is not incidental, but an emergent feature of next-token prediction objectives.Our findings suggest that while LLMs may simulate rational discourse, their decision logic remains vulnerable to the same cognitive biases that affect humans. We propose safeguards including EUT-constrained reasoning chains, hybrid human-AI assemblage architectures pairing LLMs with deterministic systems and humans in group negotiation settings, and open benchmarking of AI decision reliability. As LLMs become embedded in negotiation, credit, and policy workflows, understanding, and constraining, their rationality becomes an AI governance imperative.

Ask AI

Helpful

Bookmark

View Full Paper