This work proposes a novel architectural approach to long-term memory in transformer-based language models. We introduce a compact recurrent state combined with spectral decomposition and per-component exponential decay, enabling the model to maintain information across multiple temporal scales. Unlike standard transformers that rely solely on growing context windows, or existing state-space models, our method explicitly decomposes the recurrent state into a spectral basis with learnable decay rates. This allows different components of memory to operate on different timescales — some capturing local context, others preserving long-term thematic information. We present results from a minimal viable experiment on a small prototype model (~2 million parameters) trained on synthetic data with controlled topic dynamics. Key findings include: - State Variation of 0.09, indicating active temporal dynamics in the spectral memory - Delayed Topic Probe accuracy exceeding 35% at a distance of 128 tokens (random baseline: 20%) - A reasonable distribution of learned decay rates (mean Γ ≈ 1.2) These results demonstrate that even at a very small scale, the proposed spectral memory mechanism enables statistically significant retention of thematic information over meaningful distances. The document also outlines a comprehensive experimental framework for future work, including causal analysis through targeted ablation of spectral components, mutual information analysis between decay rates and information types, and scaling experiments. This research contributes to the growing body of work on efficient long-context modeling by offering an interpretable and structured approach to recurrent memory in transformers.
Serhii Kanivets (Mon,) studied this question.