Modern AI systems interact with external tools through structured protocols such as the Model Context Protocol (MCP), which require delivering complete tool definitions to the language model on every conversational turn. In multi-server environments connecting 5-20 tool providers, aggregate tool manifests consume 30,000-80,000 tokens per turn. We present a three-part architecture: (1) a multi-layer compression pipeline operating as a transparent bidirectional proxy achieving 67-84% token reduction on empirical MCP traffic across 10 servers and 102 tools; (2) a decoupled schema registry that separates tool metadata delivery from tool execution, serving compressed Tool Cards through the AI platform's native deferred-loading mechanism; and (3) an empirical adaptive routing layer that scores model capability from observed tool-call performance and routes each invocation to the lowest-cost capable provider. We introduce a four-problem taxonomy of MCP overhead — format verbosity (~13%), content duplication (~10%), authoring quality (~40%), and cloud bypass (~37%). Empirical evaluation across 10 MCP servers demonstrates per-server cleaning rates from 0% to 84%, with a weighted average of 67%. The architecture is validated by 804+ automated tests including adversarial hardening across three independent AI models. A hallucination-to-execution bypass prevention mechanism closes a critical security gap. All four patents covering this architecture were filed March 30, 2026. Patent Support: Patents 4 (Proxy Compression), 5 (Session Continuity), 6 (Intelligence Router), 8 (Schema Registry). USPTO App# 64/022,435, 64/022,445, 64/022,455, 64/022,475. Filed March 30, 2026.
P. Jeremiah Hundley (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: