What question did this study set out to answer?

This research aims to improve the handling of multi-context scenarios in large language models by enhancing cross-attention mechanisms.

April 25, 2026Open Access

Multi-Context Concatenation Across Requests for LLMs

Key Points

This research aims to improve the handling of multi-context scenarios in large language models by enhancing cross-attention mechanisms.
Proposed CatLLM method concatenates multiple contexts offline to improve cache efficiency.
Identified contexts lacking cross-attention using weighted inner products of Q and K vectors during offline processing.
Reduced contexts from 10 to 7 while processing to maximize performance.
Concatenating all contexts improved F1 score by 6% compared to separate caching baseline.
Achieved a 3% F1 score while minimizing context compression.
Showed performance improvement despite reducing the number of contexts requiring caching.

Abstract

Reusing separate, pre-filled Key-Value (KV) Caches for multiple contexts has become a common practice in handling multi-context scenarios with Large Language Models. However, this leads to a lack of cross-attention mechanisms between contexts. To address this, we propose CatLLM, the first method that concatenates multiple contexts across requests offline to compensate for this deficiency. Specifically, during offline processing, CatLLM identifies contexts that severely lack cross-attention by incorporating the weighted inner products of Q and K vectors from tokens in an un-concatenated context into an equivalently transformed weighted formulation for concatenated Q and K inner products. This yields a weighting wiA+B corresponding to the output vector difference, which can then be used to identify contexts with severe cross-attention deficiencies and concatenate them into a single context for KV Cache computation. Experimental results show that, compared to the baseline of separate caching (i.e., no concatenation), fully concatenating all contexts improves the F1 score by 6%. Meanwhile, the proposed method reduces the number of contexts requiring caching from 10 to 7 while achieving a 3% F1 score, thereby maximizing performance improvement while minimizing the degree of context compression.

Multi-Context Concatenation Across Requests for LLMs

Key Points

Abstract

Cite This Study