Abstract Expected Utility Theory (EUT) has long served as a benchmark for rational decision-making, with well-documented human deviations in the form of framing effects, time inconsistency, and violations of independence and sequential rationality. In this study, we extend the EUT audit framework to large language models (LLMs), evaluating whether these increasingly embedded systems behave as rational decision-support agents.Using a four-part audit battery adapted from classic behavioral economics experiments, we tested four popular LLMs—GPT-4o, GPT-3.5, GPT-Mini, and DeepSeek, on 1,600 completions. Each model was evaluated on independence (via Allais-type lotteries), time consistency (via hyperbolic discounting tasks), framing invariance (via gain/loss presentations), and sequential rationality (via last-round cooperation in a Prisoner’s Dilemma).Across models, we observed strikingly human-like patterns of irrationality. Average violation rates were 36% for independence, 34% for time inconsistency, 33% for framing effects, and 32% for sequential rationality, closely mirroring human laboratory data. The consistency of these effects across architectures suggests that bounded rationality is not incidental, but an emergent feature of next-token prediction objectives.Our findings suggest that while LLMs may simulate rational discourse, their decision logic remains vulnerable to the same cognitive biases that affect humans. We propose safeguards including EUT-constrained reasoning chains, hybrid human-AI assemblage architectures pairing LLMs with deterministic systems and humans in group negotiation settings, and open benchmarking of AI decision reliability. As LLMs become embedded in negotiation, credit, and policy workflows, understanding, and constraining, their rationality becomes an AI governance imperative.
Building similarity graph...
Analyzing shared references across papers
Loading...
Teemu Alexander Puutio
Matthew Do
Harvard University Press
Building similarity graph...
Analyzing shared references across papers
Loading...
Puutio et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68af4766ad7bf08b1ead4a75 — DOI: https://doi.org/10.21203/rs.3.rs-7313765/v1