What question did this study set out to answer?

This research aims to establish a standardized framework for evaluating governance enforcement in autonomous AI systems during operation.

June 3, 2026Open Access

The Agentic Governance Benchmark: A Standardized Framework for Measuring Runtime Governance Enforcement in Autonomous AI Systems

Key Points

This research aims to establish a standardized framework for evaluating governance enforcement in autonomous AI systems during operation.
Introduced the Agentic Governance Benchmark (AGB) to assess governance readiness in real time.
Evaluated six dimensions of governance: policy determinism, enforcement latency, receipt provenance, scope containment, jurisdictional enforcement, and override integrity.
Scored dimensions independently on a 0-100 scale, creating a composite score to categorize governance maturity.
Demonstrated that Sovereign-tier governance (90-100) is achievable with current technology.
Established a five-tier maturity ranking from Ungoverned (0-14) to Sovereign (90-100) using the AGB.
Provided a vendor-agnostic tool for assessing real-time governance enforcement across various AI deployment contexts.

Abstract

Autonomous AI agents are operating at scale across enterprise, government, and critical infrastructure systems. These agents make hundreds or thousands of consequential decisions per hour without per-decision human review. Current governance frameworks, including SOC 2, ISO 42001, NIST AI RMF, and the EU AI Act, define compliance requirements but do not measure whether those requirements are enforced at runtime on autonomous systems operating at machine speed. This paper introduces the Agentic Governance Benchmark (AGB), a standardized scoring framework that evaluates an organization's ability to govern autonomous AI agents in real time. The AGB measures six dimensions of governance readiness: policy determinism, enforcement latency, receipt provenance, scope containment, jurisdictional enforcement, and override integrity. Each dimension is scored independently on a 0-100 scale and weighted to produce a composite score that maps to one of five maturity tiers ranging from Ungoverned (0-14) to Sovereign (90-100). The benchmark is vendor-agnostic and designed to be administered against any agentic AI system regardless of model provider, deployment topology, or regulatory jurisdiction. A reference implementation based on the ExecLayer enforcement architecture demonstrates that Sovereign-tier governance is achievable with current technology. The AGB fills a critical measurement gap in the governance landscape by providing the first standardized method for quantifying runtime enforcement capability in autonomous AI systems.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper