Updated version of the Sparse Semantic Patch Memory (SSPM) framework including full architecture, Gemini 2.5 Flash based experimental validation, and extended empirical evaluation demonstrating significant token reduction while preserving reasoning constraints.Large Language Model (LLM) deployments face practical limitations due to restricted context windows, increasing inference costs, and the accumulation of redundant conversational history. Traditional approaches such as token truncation or surface-level compression often discard important constraints and decisions, leading to degraded reasoning continuity. This repository presents Sparse Semantic Patch Memory (SSPM) — a semantics-first conversational memory architecture that represents dialogue history as compact utility-scored semantic patches rather than raw token sequences. Each conversational turn is decomposed into structured units such as entities, constraints, decisions, code snippets, equations, and structural cues, which are extracted using a DeepSeek-style semantic extraction pipeline implemented with Gemini 2.5 Flash. The SSPM framework applies schema-guided extraction, composite utility scoring, and dependency-aware greedy knapsack selection to retain only the most valuable patches under a strict token budget. Selected patches are stored in an indexed sparse memory structure and dynamically composed with the current query to form a compact prompt for downstream reasoning. Empirical evaluation across five multi-turn technical dialogues demonstrates that SSPM achieves an average token reduction of 48.7% while preserving 100% of explicit constraints and decisions, significantly outperforming conventional raw history, truncation, and compression baselines. The system is implemented as a fully modular pipeline consisting of semantic extraction, utility scoring, sparse selection, indexed memory storage, and prompt composition. This design enables scalable, cost-aware conversational memory management and provides a practical foundation for building long-context reasoning systems and agentic LLM architectures.
Dhruv Dubey (Sun,) studied this question.