What question did this study set out to answer?

The study aims to classify failure modes in autonomous agents and identify design tradeoffs between frameworks.

March 18, 2026Open Access

When Agents Fail: Design Tradeoffs Between Multi-Criteria Decision Frameworks and Autonomous AI Agents

Key Points

The study aims to classify failure modes in autonomous agents and identify design tradeoffs between frameworks.
Classified 16 failure modes into four categories based on autonomous agent architecture.
Tested the applicability of the taxonomy on the OWASP Top 10 for LLM Applications and MITRE ATLAS techniques.
Performed inter-rater reliability assessment yielding Cohen's kappa of 0.84.
Identified four main categories of failure modes: output proportionality, language-surface attacks, resource integrity, and emergent behaviors.
Most perceived advantages of non-agentic systems relate to narrower operational scope rather than structural design.
The taxonomy shows promise as a preliminary diagnostic framework for system designers.

Abstract

Shapira et al. (2026) documented 10 security vulnerabilities and 6 emergent safety behaviors across 6 unconstrained autonomous agents observed over 14 days in a Discord-based multi-agent environment (arXiv:2602.20021). We organize these 16 failure modes into a taxonomy of four architecture-dependent categories: (A) output proportionality failures arising from unbounded generative actions, (B) language-surface attacks enabled by natural language interfaces, (C) resource and state integrity failures from persistent mutable state, and (D) emergent multi-agent behaviors inherent to autonomous goal-forming systems. The principal finding is that most apparent "advantages" of non-agentic systems are consequences of narrower scope, not superior design: a non-conversational system avoids language attacks by not having a language interface; a stateless function avoids emergence by not having goals. Only Category A (output proportionality) and part of Category C (computational boundedness) appear to reflect structural properties rather than scope limitations. An inherent limitation is the asymmetry of the comparison: AEGIS is a scoring function that produces numbers, while agents are autonomous systems that take actions—many contrasts reflect this scope difference rather than an architectural insight. We do not claim that MCDA replaces agents, but propose that different architectures tend to exhibit different classes of failure—a modest observation that may nonetheless be useful as a preliminary diagnostic framework for system designers. We test the taxonomy's applicability beyond its derivation source by classifying the OWASP Top 10 for LLM Applications and the MITRE ATLAS adversarial technique taxonomy using three explicit decision criteria. An inter-rater reliability assessment yields Cohen's kappa of 0.84, providing preliminary evidence that the classification criteria are reproducible—though both raters are the paper's authors, and validation by naive external raters is needed.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Anderson Acosta de Paiva

Priscylla Lygia Boente do Nascimento

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

When Agents Fail: Design Tradeoffs Between Multi-Criteria Decision Frameworks and Autonomous AI Agents

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study