What question did this study set out to answer?

This research aims to establish a governance framework for autonomous AI agents in software engineering.

March 1, 2026Open Access

Governing Autonomous AI Agents in Production Software Engineering: Structural Invariants, Context Degradation Signals, and Bounded Recovery

Key Points

This research aims to establish a governance framework for autonomous AI agents in software engineering.
Developed governance framework through operation of ten AI agent personas.
Constructed a SaaS platform with 62 tables addressing specific failure modes.
Organized mechanisms into seven process layers and five-phase workflow.
Identified eight new failure modes in traditional software development.
Enforced structural mechanisms to prevent context degradation and ensure proper audits.
Provided a transferable template for implementing improved governance.

Abstract

We present a governance framework for autonomous AI agents that build and operate production software. The framework emerged from operating ten AI agent personas to construct a 62-table SaaS platform with billing, authentication, and multi-tenant isolation. We identified eight failure modes absent from traditional software development: context rot from conversation history compression, unbounded fix-break retry loops, cross-tenant query omissions, mutable financial records, non-idempotent writes, systematic self-assessment bias in code review classification, builder-auditor conflation when the same agent builds and reviews its own work, and model-capability mismatch when implementation-optimized models perform judgment tasks. We address each through structural mechanisms that the agent cannot circumvent: workflow phases enforce fresh-context boundaries so degradation cannot propagate across roles, a classification script assigns review tiers based on diff properties rather than agent self-report, errors are classified into four types before recovery begins with hard per-category iteration limits, multi-agent decisions use sequential written artifacts rather than group deliberation, financial tables enforce append-only semantics with reversal-entry corrections, all write operations require duplicate-call idempotency tests, model-specific capability requirements are enforced per workflow phase, and an asymmetric audit methodology requires grading before fixing with a published scorecard. The mechanisms are organized into seven process layers and a five-phase deterministic workflow, then packaged as a transferable template. We describe the design, its rationale, and its relationship to existing work in agent safety and software process frameworks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Greg Arnold (Thu,) studied this question.

synapsesocial.com/papers/69a3d824ec16d51705d2ebd3 — DOI: https://doi.org/10.5281/zenodo.18794215

Governing Autonomous AI Agents in Production Software Engineering: Structural Invariants, Context Degradation Signals, and Bounded Recovery

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion