CanonCache is a research framework and benchmarking platform for evaluating semantic prompt canonicalization as a strategy to improve KV-cache prefix reuse in multi-tenant large language model (LLM) serving systems. Modern prefix-caching systems such as vLLM and SGLang require exact token-prefix matches to reuse KV-cache states efficiently. Real-world user prompts are often semantically equivalent but syntactically diverse, resulting in low cache hit rates and redundant GPU computation. CanonCache introduces a deterministic semantic canonicalization layer that rewrites semantically equivalent prompts into standardized canonical forms before inference, increasing exact-prefix overlap and enabling higher cache reuse. The accompanying benchmark framework evaluates: Prefix cache hit rate Token reduction Semantic similarity Latency behavior Novel efficiency metrics (SCAR and SCE) Experimental evaluation using the Qwen/Qwen3.5-9B model via LM Studio demonstrates: Cache hit rate increase from 5% to 37.5% 47.5% prompt token reduction Semantic Cache Amplification Ratio (SCAR) of 7.5× Semantic Compression Efficiency (SCE) of 0.475 This upload contains: Research preprint Benchmark framework Sample benchmark datasets Documentation GitHub Repository:https://github.com/masoomul786/canon-cache License:CC BY-NC-SA 4.0
Masoomul Haque Choudhury (Wed,) studied this question.