What question did this study set out to answer?

This research aims to address the integrity problems in agentic AI systems, specifically focusing on prompt injections.

June 10, 2026Open Access

Contract-Bound Cognitive Routing: A Control Architecture for Agentic AI Integrity, Delegation, and Prompt-Injection Containment

Key Points

This research aims to address the integrity problems in agentic AI systems, specifically focusing on prompt injections.
Developed the Contract-Bound Cognitive Routing architecture to manage information flow in AI systems.
Implemented a deterministic reference monitor, the MCP Policy Firewall, to regulate high-authority actions.
Applied a monotonic non-amplification rule to enhance multi-agent delegation control.
CBCR confines residual risks from untrusted content to a single endorsement gate with measurable false-endorsement rates.
Enhanced control flow management led to a significant reduction in prompt injection incidents.
Achieved improved integrity by enforcing deterministic constraints on agentic behavior through structured data.

Abstract

Agentic AI systems connect probabilistic reasoning to tools, memory, external data, other agents, and state-changing operations. Their dominant failure mode — prompt injection — is an integrity problem: low-integrity input contaminating high-authority action. Contract-Bound Cognitive Routing (CBCR) treats it as one, modelling agentic execution as information flow over a typed, capability-gated graph mediated by a deterministic reference monitor, the MCP Policy Firewall. The architecture is organised around a single distinction: what can be enforced deterministically, and what cannot. A large class of agentic behaviour can be constrained by construction — the control flow of a plan derived from trusted instructions, and structured data whose value-space is closed and validated — with no reliance on the model's judgement. The boundary is precise, and it is the line most defences blur: schema validation checks shape, not meaning, so type is not trust. Beyond that line — free text, semantically-loaded fields, data-derived parameters, untrusted endpoints — lies a residual that cannot be made deterministic. CBCR's contribution is to confine that residual to a single declared, fail-closed endorsement gate, make it measurable as false-labelling and false-endorsement rates weighted by reachable authority, and extend the discipline across multi-agent delegation through a monotonic non-amplification rule. It adopts the dual-path construction of Willison's dual-LLM pattern (2023) and CaMeL (Debenedetti et al., 2025); it does not solve conservative label propagation through a black-box model, which it states as the load-bearing open problem. CBCR does not make untrusted content safe. It prevents untrusted content from reaching high-authority sinks except through a declared gate, and turns the risk left behind into a measured quantity. Version: v0.7 Language: English

Perguntar à IA

Bookmark

View Full Paper