This paper introduces the Think-Answer Quantization Gap (TAQG), a theoretical framework proving that uniform KV cache quantization is provably suboptimal for large reasoning models whenever think-phase and answer-phase tokens differ in pairwise cosine redundancy. The framework is direction-agnostic: it prescribes fewer bits for whichever phase exhibits higher redundancy. Empirical validation on DeepSeek-R1-Distill-Qwen-1.5B reveals a surprising model-size-dependent redundancy reversal, where answer-phase tokens exhibit higher redundancy than think-phase tokens - opposite to findings on the full 671B model. Code and experimental data are included.
Raviteja Nekkalapu (Fri,) studied this question.