What question did this study set out to answer?

The aim is to establish a theoretical framework, the Think-Answer Quantization Gap, for optimizing KV cache quantization in large reasoning models.

April 12, 2026Open Access

Think Less, Store Smarter: A Theoretical Framework for Type-Aware KV Cache Quantization in Large Reasoning Models

Key Points

The aim is to establish a theoretical framework, the Think-Answer Quantization Gap, for optimizing KV cache quantization in large reasoning models.
Introduced the Think-Answer Quantization Gap (TAQG) framework.
Proved the suboptimality of uniform KV cache quantization under certain conditions.
Validated the framework using DeepSeek-R1-Distill-Qwen-1.5B model.
Found that answer-phase tokens showed higher cosine redundancy than think-phase tokens in the tested model.
Observed a model-size-dependent reversal in token redundancy compared to findings on the larger 671B model.

Abstract

This paper introduces the Think-Answer Quantization Gap (TAQG), a theoretical framework proving that uniform KV cache quantization is provably suboptimal for large reasoning models whenever think-phase and answer-phase tokens differ in pairwise cosine redundancy. The framework is direction-agnostic: it prescribes fewer bits for whichever phase exhibits higher redundancy. Empirical validation on DeepSeek-R1-Distill-Qwen-1.5B reveals a surprising model-size-dependent redundancy reversal, where answer-phase tokens exhibit higher redundancy than think-phase tokens - opposite to findings on the full 671B model. Code and experimental data are included.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Raviteja Nekkalapu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Think Less, Store Smarter: A Theoretical Framework for Type-Aware KV Cache Quantization in Large Reasoning Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study