What question did this study set out to answer?

The research aims to develop a persistent memory system for transformers that aids in long-term retention of knowledge beyond a context window.

January 18, 2026Open Access

Implementation of Persistent Latent Memory for Decoder Transformers

Key Points

The research aims to develop a persistent memory system for transformers that aids in long-term retention of knowledge beyond a context window.
Developed a hybrid memory system combining latent vector centers for short-term and long-term memory.
Implemented a two-phase reading mechanism using TerrainPrior and MemoryAttention for effective information processing.
Simulated long-term operation to evaluate memory retention, interference, and fatigue using synthetic data.
The proposed memory system preserves knowledge for thousands of interactions without degradation.
Diffusion in memory management reduced local saturation, enhancing performance.
The STM layer effectively filtered noise before writing to LTM.

Abstract

Persistent memory is crucial for enabling Large Language Models (LLMs) to retain and expand knowledge over the long term, eyond the limits of a restricted context window. This work builds upon the theoretical Neuromorphic Cognitive Architecture and presents an implementation of persistent latent memory for Transformers. The memory combines a sharp representation of memory traces as latent vector centers (LTM: 64-dimensional keys, STM: 16-dimensional keys) with a compressed 3D terrain (483), in which information diffuses and is homeostatically balanced. A two-phase reading mechanism utilizes a TerrainPrior module (3D prior) and MemoryAttention (RBF kernel attention) with controlled integration into the decoder (gating). Memory writing occurs segment-wise and is weighted by a combination of novelty, prediction error, and emotional salience; we included separate short-term (STM) and long-term memory (LTM) with periodic consolidation ("sleep") instead of hard deletion. In the experimental section, we simulate long-term operation (on the order of months) using synthetic data and measure key metrics: retention (information preservation), interference (mixing of traces), growth of memory centers, fatigue (need for consolidation), and the evolution of the distributed memory terrain (H3). The results show that the proposed memory can preserve knowledge outside the context window for thousands of interactions without significant degradation.An ablation study confirms the benefit of diffusion (eliminates local saturation) and the STM layer (filters noise before writing to LTM), while the TerrainPrior module surprisingly did not yield an improvement in retrieval accuracy. We discuss the implications of these findings and outline further research directions – particularly the integration of memory into full LLMs and lifelong learning without retraining the model’s main weights. The main contributions of this work are two operational invariants that are key for long-term LLM memory:• Constant data size: The memory module is pre-allocated to a fixed volume (the entire structure is ∼500 MB in the prototype architecture, of which effectively used data is ∼1.3 MB/user) and does not increase with operating time or the amount of stored data.• Constant low read/write latency: Read/write operations occur in microseconds and are independent of the memory "age" (number of interactions) and the volume of stored data, thanks to fixed capacity and local computation.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper