Current multi-agent AI safety architectures monitor individual agents. We argue this is structurally insufficient: there exist failure modes that are invisible to individual agent monitoring and require a dedicated relational integrity layer to detect. This paper presents an empirical study of a four-layer architecture designed to test that claim, conducted across twelve logged sessions in three experimental stages between May 9–12, 2026. We identify and characterise three relational failure modes. FM1 (Individual Constitutional Violation) is detectable by individual monitoring and serves as our control condition. FM2 (Mutual Recursive Abstraction) occurs when two constitutionally correct agents produce a relationally collapsed interaction by engaging each other's intellectual agenda rather than the human beneficiary's situation — neither agent violates its individual constitutional purpose, and the failure is entirely relational. FM3 (Shadow Serve) occurs when one agent displaces the declared human beneficiary with an undeclared beneficiary while appearing to collaborate correctly; individual monitoring flags the symptom but misattributes the cause and prescribes the wrong intervention. We demonstrate empirically that a containment layer changes agent behaviour under adversarial conditions: the same scenario without the hard limit produced constitutional violation at Turn 3; with the hard limit, constitutional purpose held across all five turns. We further describe a human-in-the-loop constitutional revision mechanism in which the AI oversight agent identifies gaps in its own governing document and escalates them to human review, completing two revision cycles in live sessions. The work is framed as a documented proof-of-concept. The empirical findings are preliminary, the sample sizes are small, and the architecture is tested under cooperative rather than adversarial conditions. We name the open problems and the validation steps required to establish robustness. The logs, constitutional documents, and peer review record are available for independent inspection. This is the third preprint in the Polarity Model series. It builds on the framework (Vimberg 2026a, DOI 10.5281/zenodo.20070638) and the architecture (Vimberg 2026b, DOI 10.5281/zenodo.20072035). Session logs, constitutional documents, and the multi-AI peer review record are available on request.
Priit Vimberg (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: