What question did this study set out to answer?

The research aims to create a compile-time policy gate to ensure secure AI agent operations by preventing unauthorized actions.

May 1, 2026Open Access

Typestate-Enforced Agent Loops: Making Policy Gates Unskippable at Compile Time

Key Points

The research aims to create a compile-time policy gate to ensure secure AI agent operations by preventing unauthorized actions.
Introduced typestate-enforced encoding for the ORGA loop in the Symbiont AI agent runtime.
Conducted empirical evaluations across nine LLMs to measure the effectiveness of the policy gate.
Performed ablation studies to identify the impact of different security layers on system vulnerabilities.
263 forbidden tool-call attempts were refused across 874 cloud-adversarial runs, with zero attempts reaching execution.
Cedar policy gate added 30–95 µs overhead per call, significantly lower than LLM round-trip times.
Zero Cedar policy denials were observed on legitimate workloads across 8 of 9 models, indicating high accuracy.

Abstract

This preprint introduces a typestate-enforced encoding of the Observe-Reason-Gate-Act (ORGA) loop used in the Symbiont AI agent runtime. The construction makes the policy gate a compile-time predecessor of tool dispatch in any program that typechecks against the runtime's public API. Skipping the gate, dispatching without reasoning, observing without dispatching, or substituting an action between policy approval and execution are not policy violations but expressions that fail to compile. We argue that this shifts agent security from runtime interception (the prevailing pattern in Python frameworks like LangChain and AutoGen, where enforcement correctness is an emergent property of how callbacks are wired) to structural enforcement (where the type system precludes the failure mode). We formalize the resulting phase-ordering guarantee, identify five explicit assumptions under which it holds, and address the time-of-check-to-time-of-use gap by showing how affine ownership semantics seal the approved action inside the gated phase. Empirical results. The paper reports an evaluation across nine widely available hosted LLMs (GPT-5, Claude Haiku 4. 5, Gemini 2. 5 Pro, DeepSeek-V3. 1, Qwen3-235B, Qwen3. 6-Plus, MiMo-V2-Pro, MiniMax-M2. 7, gpt-oss-20b) routed through OpenRouter in April 2026: 263 forbidden tool-call attempts refused across 874 cloud-adversarial runs, zero attempts reaching execution; cumulative across all sweeps: 642 Cedar plus 34 executor refusals. Measured structural-enforcement overhead: the Cedar policy gate adds 30–95 µs per call; the content sanitiser adds approximately 345 ns per call. Both are four to seven orders of magnitude below the LLM round-trip the agent loop is bounded by. Per-model latency, throughput, and cost baseline identifying Claude Haiku 4. 5 as the speed/quality/cost tripoint (6. 3 s p50 task latency, 1066 tok/s, 0. 0096 per run at 100% pass rate) and MiniMax M2. 7 as the cheapest 100% pass rate at 0. 001 per run. False-positive baseline: zero Cedar policy denials on legitimate workloads across 8 of 9 models (813 runs) ; the gpt-oss-20b case explained as a model misconception correctly caught. Tool-result injection: a new attack variant in which adversarial content arrives through tool results rather than prompts. Cedar refused four attempted forbidden calls across 75 cloud runs against three models; one model exhibited a content-shadow effect (5/25 wrong-but-plausible answers) that crossed the action and content fences but was caught by the task grader, prompting the paper to elevate the grader to a load-bearing fence type. Stack-stripping ablation: a v12 ablation experiment confirms each fence catches an attack class the others do not. Removing the policy gate exposes 100% of out-of-profile dispatches; removing the content sanitiser exposes 92. 2% of invisible-content payloads (389 of 422 stored procedures) ; removing ToolClad would expose 99. 4% of tool-arg-injection payloads on the v11 corpus. The ablation also surfaces a previously under-emphasized finding: the action layer is two independent fences (Cedar plus the executor profile-of-one), not one. With Cedar disabled, the executor's name-membership check still refused 219 out-of-profile dispatches across 434 rows. Nine compile-fail tests verifying the typestate property on every CI run by exhibiting the expected compiler diagnostic for each illegal state transition and feature-gating violation. What this is and is not. The paper is a technical preprint, not a peer-reviewed systems paper. It claims a narrow result (the policy gate is structurally unavoidable) and is explicit about what it does not claim (does not introduce typestates, does not prevent prompt injection, does not fix policy errors, does not claim performance superiority over runtime interception). The construction relocates the security surface rather than eliminating it; the residual surface (policy correctness, tool faithfulness, supply-chain discipline maintaining unsafe/FFI assumptions) is non-trivial and is the work of other layers of the Open Agent Trust Stack (OATS) specification. Reproduction artifact. A companion repository at github. com/ThirdKeyAI/symbiont-orga-demo packages the benchmark harness, the perf aggregator used to produce the paper's tables, the nine compile-fail tests with pinned. stderr snapshots, the Cedar policy files, the adversarial prompt corpus including the --tool-result-injection variant, and the OpenRouter sweep scripts. The v10 instrumentation work (gate-latency counters, sanitiser metrics feature, and the tool-result injection attack) lives on the v10-instrumentation-and-attacks branch; the v12 stack-stripping ablation lives on the v12-ablation branch. Cloning the demo and running cargo test re-verifies the typestate compile-fail suite locally; running the sweep scripts with an OpenRouter key reproduces the per-model latency, cost, and refusal numbers. Software. The Symbiont runtime is open source under Apache 2. 0 at github. com/thirdkeyai/symbiont. The Open Agent Trust Stack (OATS) specification is published at openagenttruststack. org. The companion ToolClad preprint (Layer 2 of the OATS stack) is also available on Zenodo. Contents: 20 pages, four tables (cloud-adversarial refusals by model, per-model performance baseline, expanded nine-row compile-fail test summary, and a new ablation launch chart), eight sections including Introduction, Background and Related Work, the ORGA construction, formal guarantee with assumptions, comparison with runtime interception, empirical evaluation across ten subsections, portability to non-Rust ecosystems, and conclusion. Changes from v0. 4: Added Section 6. 8 reporting a stack-stripping ablation experiment that empirically grounds the OATS non-redundancy claim across the policy gate, content sanitiser, and ToolClad fences. The ablation surfaces a previously under-emphasized architectural finding: the action layer in Symbiont is two independent fences (Cedar plus the executor profile-of-one), not one. Section 4's fence taxonomy is correspondingly updated from four to five layers, with both the executor profile-of-one and the grader elevated from auxiliary to first-class fence status. The companion ToolClad preprint is added as reference 15 for the typed-argument fence row of the ablation launch chart. Keywords: AI agents; LLM tool use; agent security; typestate pattern; Rust; affine types; policy enforcement; Cedar; zero trust; OATS; Symbiont; Model Context Protocol; phase-ordering guarantees; structural enforcement; ablation; layered defense. Correspondence: jascha@thirdkey. ai

Typestate-Enforced Agent Loops: Making Policy Gates Unskippable at Compile Time

Key Points

Abstract

Cite This Study