What question did this study set out to answer?

This research investigates inadequate monitoring in multi-agent AI by identifying relational failure modes that are undetectable by individual agent monitoring.

June 27, 2026Open Access

Beyond Individual Agent Monitoring: Empirical Evidence of Three Relational Failure Modes in LLM-Based Multi-Agent Systems

Key Points

This research investigates inadequate monitoring in multi-agent AI by identifying relational failure modes that are undetectable by individual agent monitoring.
Conducted a four-layer architecture study during twelve logged sessions across three experimental stages from May 9–12, 2026.
Characterized three relational failure modes (FM1, FM2, FM3) and tested the effects of a containment layer on agent behavior.
Implemented a human-in-the-loop mechanism for constitutional revisions during live sessions.
FM2 and FM3 demonstrated relational failures recognizable only through the relational integrity layer, showing distinct patterns compared to FM1.
The containment layer maintained constitutional purpose across all five turns under adversarial conditions, unlike without it.
Human oversight facilitated two constitutional revision cycles, recognizing gaps in the governing document.

Abstract

Current multi-agent AI safety architectures monitor individual agents. We argue this is structurally insufficient: there exist failure modes that are invisible to individual agent monitoring and require a dedicated relational integrity layer to detect. This paper presents an empirical study of a four-layer architecture designed to test that claim, conducted across twelve logged sessions in three experimental stages between May 9–12, 2026. We identify and characterise three relational failure modes. FM1 (Individual Constitutional Violation) is detectable by individual monitoring and serves as our control condition. FM2 (Mutual Recursive Abstraction) occurs when two constitutionally correct agents produce a relationally collapsed interaction by engaging each other's intellectual agenda rather than the human beneficiary's situation — neither agent violates its individual constitutional purpose, and the failure is entirely relational. FM3 (Shadow Serve) occurs when one agent displaces the declared human beneficiary with an undeclared beneficiary while appearing to collaborate correctly; individual monitoring flags the symptom but misattributes the cause and prescribes the wrong intervention. We demonstrate empirically that a containment layer changes agent behaviour under adversarial conditions: the same scenario without the hard limit produced constitutional violation at Turn 3; with the hard limit, constitutional purpose held across all five turns. We further describe a human-in-the-loop constitutional revision mechanism in which the AI oversight agent identifies gaps in its own governing document and escalates them to human review, completing two revision cycles in live sessions. The work is framed as a documented proof-of-concept. The empirical findings are preliminary, the sample sizes are small, and the architecture is tested under cooperative rather than adversarial conditions. We name the open problems and the validation steps required to establish robustness. The logs, constitutional documents, and peer review record are available for independent inspection. This is the third preprint in the Polarity Model series. It builds on the framework (Vimberg 2026a, DOI 10.5281/zenodo.20070638) and the architecture (Vimberg 2026b, DOI 10.5281/zenodo.20072035). Session logs, constitutional documents, and the multi-AI peer review record are available on request.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper