What question did this study set out to answer?

The study aims to evaluate whether query-local anchoring can improve performance in sparse KV-cache retention without losing context coverage.

June 2, 2026Open Access

Selective Gains, Unstable Averages: A Diagnostic Study of Query-Local Overlay in Sparse KV Retention

Key Points

The study aims to evaluate whether query-local anchoring can improve performance in sparse KV-cache retention without losing context coverage.
Conducted a controlled diagnostic study using deterministic anchor-window policies on Qwen/Qwen2.5-1.5B-Instruct.
Performed a focused pilot with 3 tasks and broader evaluation with 6 tasks, analyzing 32 examples per task.
Employed paired bootstrap confidence intervals to assess performance against a dense full-cache reference.
The macro gain over the structural baseline was small (+0.0042 F1), with 95% confidence interval crossing zero.
Results showed heterogeneous performance with 9 F1 wins and 3 losses across cells.
Both sparse policies performed significantly below the full-cache reference.

Abstract

Sparse KV-cache retention must decide which prompt tokens remain under a fixed memory budget. A natural hypothesis is that adding a small amount of query-local anchoring to a strong structural baseline should repair hard question-focused failures without giving up broad context coverage. We test this hypothesis in a controlled diagnostic study of deterministic anchor-window policies for Qwen/Qwen2.5-1.5B-Instruct on LongBench-style long-context QA. A focused 3-task pilot suggested that structural-query overlay can repair the hardest Qasper cell at 20% budget and improve pilot macro F1. We then run a broader support evaluation with 6 tasks, 32 examples per task, a dense full-cache reference, and paired bootstrap confidence intervals, all within the same preserved Qwen harness. In the broader study, overlay remains helpful on some cells, but its macro gain over the structural baseline is small (+0.0042 F1), the 95% confidence interval crosses zero, and the mean cell pattern is heterogeneous (9 F1 wins, 3 losses). Both sparse policies remain far below the full-cache reference. These results support a narrower conclusion than the pilot alone: in this setting, lightweight query-local overlay yields selective repairs, but does not provide a robust average improvement over a matched structural sparse baseline. The strongest generalization axis here is broader task-budget support under one fixed harness, not cross-model or method-class generalization. The evidence therefore suggests, rather than establishes, that future gains in sparse KV retention will require richer evidence-selection mechanisms than simple overlay heuristics.

Selective Gains, Unstable Averages: A Diagnostic Study of Query-Local Overlay in Sparse KV Retention

Key Points

Abstract

Cite This Study