ABSTRACT Large language models (LLMs) are increasingly considered for deployment in applications requiring strategic judgment under uncertainty. Yet it remains unclear whether their behavior in adversarial environments resembles normative decision‐making, human strategic behavior, or something qualitatively distinct from both. This study addresses that question using a controlled attacker–defender signaling game in which an attacker must interpret potentially deceptive defender announcements and decide whether to attack one of two targets or abstain. We develop a three‐way comparison framework that evaluates GPT‐4o against two benchmarks simultaneously: a normative Bayesian best‐response model and empirical human decisions drawn from a matched experimental data set. Critically, we decompose strategic behavior into two components, belief formation and action selection, to identify whether similarities and divergences across agent types arise at the level of probabilistic inference, behavioral choice, or both. The results provide partial support for normative alignment (H1): GPT‐4o's modal action matches the normative benchmark in seven out of eight scenarios, yet its decision distributions diverge significantly in all conditions (), driven by a systematic underutilization of the abort option (6.7% vs. the normative recommendation of 25.6%). Human similarity (H2) is not supported, with action frequency distributions differing significantly across all eight conditions (). The core finding is a cognitive‐action decoupling: GPT‐4o maintains more diffuse posterior beliefs than humans in six out of eight scenarios yet produces more deterministic actions, and explicitly articulates uncertainty in 14%–28% of reasoning traces while systematically overriding that uncertainty in its final decisions. These findings position current LLMs as a strategically distinct class of agent, neither fully rational equilibrium players nor behavioral mimics of human bounded rationality. The observed commission bias and belief‐action decoupling have direct implications for the deployment of LLMs in high‐stakes adversarial roles, where abstention under uncertainty is often the strategically rational choice.
Unal-Eyi et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: