What question did this study set out to answer?

This research aims to establish a reliable governability framework for AI agents to ensure their safe deployment in various applications.

July 4, 2026Open Access

Governable by Construction: The Governed Agent Doctrine - an implemented, adversarially audited architecture for AI agents

Key Points

This research aims to establish a reliable governability framework for AI agents to ensure their safe deployment in various applications.
Developed the Governed Agent Doctrine with five enforcement rules and a governed memory substrate.
Conducted an adversarial audit to evaluate the safety and efficacy of the system in real-time operations.
Implemented mechanical enforcement controls at the tool and infrastructure layers.
The system successfully maintained 8/8 under probing for load-bearing behaviors during the audit.
Initial memory system performance scored 0.5 in forgetting corrections but improved post-adjustment to the read path.
The comprehensive integration and auditing of the control systems were highlighted as the key contribution to ensuring governability.

Abstract

The AI-agent field is optimizing the wrong variable. Enormous effort goes into making agents more capable — more tools, more autonomy, longer memory — while the property that decides whether an agent is safe to put near real data, money, or customers goes largely unbuilt: governability. The evidence is stark: in a systematic survey of 30 deployed agents, documented across 45 fields each, third-party safety testing is documented for only 3, and 25 of the 30 disclose no internal safety-evaluation results at all. The controls that do exist in the literature are fragmented — risk taxonomies (OWASP, NIST, Microsoft), process frameworks (NIST AI RMF, Google SAIF), vendor recommendations (Anthropic, OpenAI), and isolated point designs (CaMeL, execution isolation, guardrail toolkits). To our knowledge, no single publicly described system combines structural containment, mechanical tool-layer enforcement, budget and kill-switch controls, tamper-evident audited memory, and an independent critic — and then publishes what happened when it was adversarially tested. This paper describes one that does. The Governed Agent Doctrine is five enforcement rules plus a governed memory substrate, implemented and running as a single-operator personal AI operating system. Each rule is enforced mechanically, outside the language model, at the tool and infrastructure layer — because a control the model can talk its way around is not a control. The paper reports the system's own adversarial audit, including the uncomfortable parts: load-bearing behaviours held at 8/8 under probing, but the memory system's forgetting initially scored 0.5 — it re-asserted a corrected fact under paraphrase, until the guarantee was moved into the read path. The argument: the integration itself, and the discipline of auditing it, is the contribution. Governability is not a feature you add later; it is a property you build in by construction.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper