Sparse KV-cache retention must decide which prompt tokens remain under a fixed memory budget. A natural hypothesis is that adding a small amount of query-local anchoring to a strong structural baseline should repair hard question-focused failures without giving up broad context coverage. We test this hypothesis in a controlled diagnostic study of deterministic anchor-window policies for Qwen/Qwen2.5-1.5B-Instruct on LongBench-style long-context QA. A focused 3-task pilot suggested that structural-query overlay can repair the hardest Qasper cell at 20% budget and improve pilot macro F1. We then run a broader support evaluation with 6 tasks, 32 examples per task, a dense full-cache reference, and paired bootstrap confidence intervals, all within the same preserved Qwen harness. In the broader study, overlay remains helpful on some cells, but its macro gain over the structural baseline is small (+0.0042 F1), the 95% confidence interval crosses zero, and the mean cell pattern is heterogeneous (9 F1 wins, 3 losses). Both sparse policies remain far below the full-cache reference. These results support a narrower conclusion than the pilot alone: in this setting, lightweight query-local overlay yields selective repairs, but does not provide a robust average improvement over a matched structural sparse baseline. The strongest generalization axis here is broader task-budget support under one fixed harness, not cross-model or method-class generalization. The evidence therefore suggests, rather than establishes, that future gains in sparse KV retention will require richer evidence-selection mechanisms than simple overlay heuristics.
Yinuo Chen (Sun,) studied this question.