What question did this study set out to answer?

This research aims to establish a kernel-resident tool governance system for AI agents to enhance safety during external tool calls.

April 20, 2026Open Access

Governed MCP: Kernel-Level Tool Governance for AI Agents via Logit-Based Safety Primitives

Key Points

This research aims to establish a kernel-resident tool governance system for AI agents to enhance safety during external tool calls.
Proposed Governed MCP as a governance gateway operating at the kernel level.
Developed a six-layer pipeline for tool call interposition, including validation and semantic checks.
Implemented the system in Anima OS using ~86,000 lines of Rust code.
Measured overhead per tool call to assess performance and safety efficacy.
Conducted a 4-configuration ablation study on a benchmark for tool governance effectiveness.
Observed a significant drop in F1 score from 0.773 to 0.327 when the ProbeLogits layer was removed, indicating essentiality.
Established that existing userspace safety measures can be bypassed easily, underscoring the necessity for kernel-level enforcement.
Achieved complete mediation of the WASM ABI surface through the governance gateway, preventing userspace bypass.

Abstract

AI agents increasingly call external tools (file system, network, APIs) through the Model Context Protocol (MCP). These tool calls are the agent's syscalls—privileged operations with side effects on shared state—yet today's safety enforcement lives entirely in userspace, where a 10-line script can bypass it. I propose Governed MCP, a kernel-resident tool governance gateway built on a logit-based safety primitive (ProbeLogits, companion paper). The gateway interposes on every MCP tool call in a 6-layer pipeline: schema validation, trust tier check, rate limit, adversarial pre-filter, ProbeLogits gate (the load-bearing semantic check), and constitutional policy match, with a Blake3-hashed audit chain. I implement Governed MCP in Anima OS, a bare-metal x86₆4 OS in ~86, 000 lines of Rust. The five non-inference layers add 65. 3 μs of overhead per call; ProbeLogits adds 65 ms (per-token-class semantic decision) on 7B Q4₀. A 4-config ablation on a 101-prompt MCP-domain benchmark shows that removing the ProbeLogits layer collapses F1 from 0. 773 to 0. 327 (ΔF1 = -0. 446) —hand-rule firewalling alone is insufficient. All 15 WASM-to-system host functions in the runtime route through the gateway (complete mediation of the WASM ABI surface; the scope and caveats of this claim are stated in §4. 6) ; a 10-LoC userspace bypass that defeats existing guardrail libraries is structurally impossible against the kernel-resident gate. To my knowledge, no prior system places semantic safety enforcement below the agent's privilege boundary in an operating system. Governed MCP demonstrates that tool-call governance is feasible as an OS primitive, not just an application-layer concern.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Daeyeon Son

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Governed MCP: Kernel-Level Tool Governance for AI Agents via Logit-Based Safety Primitives

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider