What question did this study set out to answer?

This research aims to improve tool-schema compression methods in AI systems, focusing on efficiency and effectiveness.

April 28, 2026Open Access

TSCG: Token-Context Semantic Grammar for Tool-Schema Compression in Agentic AI Systems

Key Points

This research aims to improve tool-schema compression methods in AI systems, focusing on efficiency and effectiveness.
Developed TSCG, a deterministic compiler for tool-schema compression.
Conducted empirical evaluations across 12+ language models and 20,000+ API calls.
Established a taxonomy of model-format interactions and an adaptive detection methodology.
JSON schema format identified as the primary bottleneck for small-model tool-calling.
Models show accuracy retention rates between 108-181% alongside 46-72% token savings.
Provided a comprehensive findings library and released four npm packages for implementation.

Abstract

We introduce TSCG (Token-Context Semantic Grammar), a deterministic compiler for tool-schema compression in agentic AI systems. Unlike learned compression methods, TSCG provides formal guarantees (≥51% savings on well-formed schemas) with sub-millisecond execution and zero runtime dependencies. Through systematic empirical evaluation across 12+ language models (4B to frontier scale) and 20, 000+ API calls, we establish that JSON schema format itself—not model capacity—is the primary bottleneck for small-model tool-calling. Format translation alone explains the majority of token-cost variance (R²=0. 88 collapses to R²=0. 03 when JSON baseline is removed). Our main contributions: 1. A taxonomy of model-format interactions identifying four distinct classes: format-dominated, compression-friendly, operator-neutral, and combination-fragile. 2. Per-operator decomposition framework demonstrating that operator effects are empirical and per-model—not vendor-architectural. Within-family inversions (GPT-5. 4 vs GPT-5. 2) refute hardcoded vendor classifications. 3. Adaptive empirical detection methodology (180-call sweep, ~1) for production deployment. 4. External validation on Berkeley Function Calling Leaderboard (BFCL) across Claude Sonnet 4, GPT-4o, GPT-5. 2: all three models show accuracy improvements (108-181% Accuracy Retention Rate) alongside 46-72% token savings. We release reference implementation as 4 npm packages (@tscg/core, @tscg/mcp-proxy, @tscg/tool-optimizer, @tscg/openclaw) with comprehensive findings library.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Furkan Sakizli (Sun,) studied this question.

synapsesocial.com/papers/69f04e9b727298f751e72926 https://doi.org/https://doi.org/10.5281/zenodo.19795991

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper