A correctly governed agent system can still fail. An agent may select only actions that each individually satisfy every applicable rule, while its behavioral trajectory drifts silently toward high-risk territory. We call the structural interval in which these failures occur the Execution Gap: the space between what governance validates at decision boundaries and what agents actually do in execution. Existing approaches — prompt guards, OPA/XACML policy engines, Constitutional AI, and audit layers — are structurally incapable of closing this gap. They evaluate actions locally and statelessly; the Execution Gap is a trajectory-level, stateful phenomenon. This paper provides the first empirical demonstration that the Execution Gap is real, measurable, and closeable. We implement the complete Agent Governance Stack (Papers P0–P6: atomic decision boundaries, stateful admission control via ACP, invariant measurement via IML, governance structure, and reconstructive authority via RAM) as a Python library instrumented into a LangGraph StateGraph, and run four experiments that each isolate one dimension of the gap. Key results: Compliant drift (Exp. 1 + 1b): The enforcement signal g (τ) remains identically zero across all 2, 700 drift steps (6 seeds × 450 steps) with the MockLLM, while the IML composite D̂ grows monotonically and crosses the detection threshold θ = 0. 20 in T* ∈ 259, 403 steps — direct experimental proof that compliant drift is real. Replicated with two real LLMs (mistral-small3. 1, T* = 64; deepseek-r1: 8b, T* = 65; g (τ) = 0 throughout for both), confirming the finding is architectural, not model-specific. Partial observability (Exp. 2): The RAM gate achieves IER = 0. 000 at every state-coverage level (0. 10–1. 00), versus baseline IER ∈ 0. 032, 0. 185 for attestation and always-execute strategies (10, 000 Monte Carlo samples per level). Multi-agent coordination (Exp. 3): ACP replicates the formal bound CWₐppr = 2N with zero deviation for N ∈ 2, 4, 8, 16 agents, confirming the result is framework-independent. Full stack integration (Exp. 4): The integrated ACP + IML + RAM + RecoveryLoop stack converges with D̂ bounded in 0. 27, 0. 34 over 2, 000 steps; liveness holds (49. 5% of HALT events resolved by Recovery Loop) ; no deadlock. Beyond confirmation, the implementation surfaces three refinements to the formal theory: the ACP baseline-RS assumption, liveness-rate classification for the conditional liveness theorem, and EMA convergence parametrization. The open-source implementation provides a deployable blueprint for practitioners integrating runtime governance into LangGraph-based agent systems. Code and data: https: //github. com/chelof100/agent-governance-applied This is Paper 7 of the Agent Governance Series (P0–P7; Paper 8 on scale and heterogeneity is in preparation). Related papers: P0 (arXiv: 2604. 17511), P1/ACP (arXiv: 2603. 18829), P2/IML (arXiv: 2604. 17517), P5/RAM (arXiv: 2604. 22898).
Building similarity graph...
Analyzing shared references across papers
Loading...
Marcelo Patricio Fernandez
Smile Train
Building similarity graph...
Analyzing shared references across papers
Loading...
Marcelo Patricio Fernandez (Thu,) studied this question.
www.synapsesocial.com/papers/69f5943c71405d493afff175 — DOI: https://doi.org/10.5281/zenodo.19929771