What question did this study set out to answer?

This research evaluates the effectiveness of semantic prompt canonicalization in improving kv-cache reuse for large language model inference.

May 21, 2026Open Access

CanonCache: Semantic KV-Cache Canonicalization as a Strategy for LLM Inference Efficiency in Multi-Tenant Environments

Key Points

This research evaluates the effectiveness of semantic prompt canonicalization in improving kv-cache reuse for large language model inference.
Canonicalization layer rewrites prompts into standardized forms to increase prefix overlap.
Benchmark framework assesses cache hit rate, token reduction, and latency.
Experimental evaluation conducted using the Qwen/Qwen3.5-9B model.
Cache hit rate increased from 5% to 37.5%.
Prompt token reduction of 47.5%.
Achieved a semantic cache amplification ratio (SCAR) of 7.5× and a semantic compression efficiency (SCE) of 0.475.

Abstract

CanonCache is a research framework and benchmarking platform for evaluating semantic prompt canonicalization as a strategy to improve KV-cache prefix reuse in multi-tenant large language model (LLM) serving systems. Modern prefix-caching systems such as vLLM and SGLang require exact token-prefix matches to reuse KV-cache states efficiently. Real-world user prompts are often semantically equivalent but syntactically diverse, resulting in low cache hit rates and redundant GPU computation. CanonCache introduces a deterministic semantic canonicalization layer that rewrites semantically equivalent prompts into standardized canonical forms before inference, increasing exact-prefix overlap and enabling higher cache reuse. The accompanying benchmark framework evaluates: Prefix cache hit rate Token reduction Semantic similarity Latency behavior Novel efficiency metrics (SCAR and SCE) Experimental evaluation using the Qwen/Qwen3.5-9B model via LM Studio demonstrates: Cache hit rate increase from 5% to 37.5% 47.5% prompt token reduction Semantic Cache Amplification Ratio (SCAR) of 7.5× Semantic Compression Efficiency (SCE) of 0.475 This upload contains: Research preprint Benchmark framework Sample benchmark datasets Documentation GitHub Repository:https://github.com/masoomul786/canon-cache License:CC BY-NC-SA 4.0

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Masoomul Haque Choudhury (Wed,) studied this question.

synapsesocial.com/papers/6a0ea196be05d6e3efb606eb https://doi.org/https://doi.org/10.5281/zenodo.20299052

Bookmark

View Full Paper