What question did this study set out to answer?

To address scalability and context management challenges in large language models during extended dialogues.

April 8, 2026Open Access

DSPM: Dynamic Semantic Patch Memory for Token-Efficient Long-Dialogue LLM Context Management

Key Points

To address scalability and context management challenges in large language models during extended dialogues.
Introduces Dynamic Semantic Patch Memory (DSPM) framework for compression.
Decomposes conversational memory into typed semantic patches.
Employs deterministic and utility-driven operators to manage token budget effectively.
Achieves a mean Token Reduction Rate (TRR) of 82.4%.
Surpasses design targets of 55% and 60% reduction.
Maintains a mean consistency score of 3.57/5.0 relative to full-history baselines.

Abstract

Large language models (LLMs) deployed in extended, multi-turn dialogue settings face a fundamental scalability bottleneck: raw conversation histories grow without bound, rapidly exhausting fixed context windows and inflating inference costs. Existing mitigation strategies -- sliding-window truncation and monolithic LLM summarization—achieve token reduction at the expense of critical semantic fidelity. We present Dynamic Semantic Patch Memory (DSPM), a structured, seven-technique compression framework that decomposes conversational memory into typed semantic patches and maintains a token-budget-constrained context through a pipeline of deterministic and utility-driven operators. DSPM achieves a mean Token Reduction Rate (TRR) of 82.4% ± 4.21% across seven heterogeneous technical dialogue scenarios, surpassing the 55% and 60% design targets, while retaining a mean consistency score of 3.57/5.0 relative to full-history baselines. Critical constraints and decisions are preserved through a guaranteed retention mechanism, yielding a mean Critical Retention Rate (CRR) of 94.2%. All experiments are reproducible on commodity hardware using free-tier API access, demonstrating the accessibility of the approach.

Perguntar à IA

Bookmark

View Full Paper