What question did this study set out to answer?

To investigate how different LLMs handle decision-making within autonomous ML environments.

April 30, 2026Open Access

The Autonomous Sunk-Cost Fallacy: Stopping Failures and Meta-Reasoning in LLMs Deployed within the Autonomous Empirical Optimization System (AEOS)

Key Points

To investigate how different LLMs handle decision-making within autonomous ML environments.
Deployed 13 LLMs across different architectures in the AEOS sandbox.
Utilized the Extended-Horizon experimentation framework with high iteration limits.
Assessed models' ability to identify performance plateaus and terminate unproductive efforts.
Found general-purpose and frontier models succumb to the Autonomous Sunk-Cost Fallacy.
Demonstrated that instruction-tuned code models successfully recognize stagnation and terminate exploration.
Significant compute waste observed in unproductive loops by non-instruction-tuned models.

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in writing and executing code, leading to the development of autonomous agentic loops for Machine Learning (ML) engineering. However, when deployed autonomously without human intervention, these modelsexhibit distinct behavioral failure modes reminiscent of human cognitive biases. In this paper, we deploy 13 different LLMs- spanning frontier, general-purpose, and code-specialized architectures- into the Autonomous Empirical Optimization System (AEOS), a zero-human sandbox designed to autonomously solve ML pipelines. We introduce an "Extended-Horizon" experimentationframework where agents are granted massive iteration limits and widened patience thresholds,testing their intrinsic ability to recognize performance plateaus and autonomously terminate prior to system-forced intervention. Our findings reveal that both premium frontier models and general-purpose local models suffer from a severe "Autonomous Sunk-Cost Fallacy," trapping themselves in unproductive loops, wasting significant compute. Conversely, we demonstrate that modern, instruction-tuned code models possess a superior meta-reasoning alignment, allowing them to accurately identify stagnation and gracefully terminate exploration

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sanskar jajoo

HEM Technologies (United States)

Actions

Institutions

HEM Technologies (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Autonomous Sunk-Cost Fallacy: Stopping Failures and Meta-Reasoning in LLMs Deployed within the Autonomous Empirical Optimization System (AEOS)

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study