Persistent memory is crucial for enabling Large Language Models (LLMs) to retain and expand knowledge over the long term, eyond the limits of a restricted context window. This work builds upon the theoretical Neuromorphic Cognitive Architecture and presents an implementation of persistent latent memory for Transformers. The memory combines a sharp representation of memory traces as latent vector centers (LTM: 64-dimensional keys, STM: 16-dimensional keys) with a compressed 3D terrain (483), in which information diffuses and is homeostatically balanced. A two-phase reading mechanism utilizes a TerrainPrior module (3D prior) and MemoryAttention (RBF kernel attention) with controlled integration into the decoder (gating). Memory writing occurs segment-wise and is weighted by a combination of novelty, prediction error, and emotional salience; we included separate short-term (STM) and long-term memory (LTM) with periodic consolidation ("sleep") instead of hard deletion. In the experimental section, we simulate long-term operation (on the order of months) using synthetic data and measure key metrics: retention (information preservation), interference (mixing of traces), growth of memory centers, fatigue (need for consolidation), and the evolution of the distributed memory terrain (H3). The results show that the proposed memory can preserve knowledge outside the context window for thousands of interactions without significant degradation.An ablation study confirms the benefit of diffusion (eliminates local saturation) and the STM layer (filters noise before writing to LTM), while the TerrainPrior module surprisingly did not yield an improvement in retrieval accuracy. We discuss the implications of these findings and outline further research directions – particularly the integration of memory into full LLMs and lifelong learning without retraining the model’s main weights. The main contributions of this work are two operational invariants that are key for long-term LLM memory:• Constant data size: The memory module is pre-allocated to a fixed volume (the entire structure is ∼500 MB in the prototype architecture, of which effectively used data is ∼1.3 MB/user) and does not increase with operating time or the amount of stored data.• Constant low read/write latency: Read/write operations occur in microseconds and are independent of the memory "age" (number of interactions) and the volume of stored data, thanks to fixed capacity and local computation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Michal Seidl
Building similarity graph...
Analyzing shared references across papers
Loading...
Michal Seidl (Fri,) studied this question.
www.synapsesocial.com/papers/696c785beb60fb80d139680e — DOI: https://doi.org/10.5281/zenodo.18267378