What type of study is this?

This is a Quantitative Study study.

October 13, 2025Open Access

WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models

Key Points

WeightedKV improves context integrity while reducing memory usage during autoregressive generation.
The method merges values of less important tokens based on attention scores, enhancing output quality.
Assessment on four language modeling datasets shows superior performance compared to baseline methods.
Employing singular value decomposition reveals the distinct characteristics of keys and values in cache mechanisms.

Abstract

Large Language Models (LLMs) use key-value (KV) cache to reduce redundant computation in autoregressive generation. However, the KV cache size increases linearly during generation, leading to excessive memory usage, especially for long texts. Most KV cache compression methods evict the unimportant KV pairs to maintain a fixed cache size, which leads to the permanent loss of tokens during generation. However, singular value decomposition shows that values do not exhibit a strong low-rank property as keys do, suggesting that information is distributed more evenly across values, in contrast to its more redundant distribution within keys. Therefore, methods that evict both keys and values risk losing crucial information and compromise context integrity, ultimately degrading the output quality. To address this problem, we propose WeightedKV, a novel, training-free approach that discards the keys of less important tokens, while merging their values into neighboring tokens via a convex combination weighted by their average attention scores. In this way, the retained keys serve as anchors that guide the generation process, while the merged values provide a rich contextual backdrop. We assess our method on four widely used language modeling datasets, demonstrating superior performance compared to all baseline methods, particularly with a lower budget ratio.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper