What question did this study set out to answer?

This research aims to develop an architectural approach for improving long-term memory in transformers using spectral memory.

June 17, 2026Open Access

Recurrent State Injection with Fading Spectral Memory A Research Proposal and Preliminary Results for Long-Context Coherence in Transformers

Key Points

This research aims to develop an architectural approach for improving long-term memory in transformers using spectral memory.
Introduced a compact recurrent state with learnable decay rates.
Conducted a minimal viable experiment with a prototype model (~2 million parameters) on synthetic data.
Outlined a framework for future causal analysis and scaling experiments.
State Variation of 0.09 suggests active temporal dynamics in the spectral memory.
Achieved a Delayed Topic Probe accuracy of 35% at 128 tokens (random baseline: 20%).
Mean decay rates estimated at Γ ≈ 1.2 indicate effective memory retention.

Abstract

This work proposes a novel architectural approach to long-term memory in transformer-based language models. We introduce a compact recurrent state combined with spectral decomposition and per-component exponential decay, enabling the model to maintain information across multiple temporal scales. Unlike standard transformers that rely solely on growing context windows, or existing state-space models, our method explicitly decomposes the recurrent state into a spectral basis with learnable decay rates. This allows different components of memory to operate on different timescales — some capturing local context, others preserving long-term thematic information. We present results from a minimal viable experiment on a small prototype model (~2 million parameters) trained on synthetic data with controlled topic dynamics. Key findings include: - State Variation of 0.09, indicating active temporal dynamics in the spectral memory - Delayed Topic Probe accuracy exceeding 35% at a distance of 128 tokens (random baseline: 20%) - A reasonable distribution of learned decay rates (mean Γ ≈ 1.2) These results demonstrate that even at a very small scale, the proposed spectral memory mechanism enables statistically significant retention of thematic information over meaningful distances. The document also outlines a comprehensive experimental framework for future work, including causal analysis through targeted ablation of spectral components, mutual information analysis between decay rates and information types, and scaling experiments. This research contributes to the growing body of work on efficient long-context modeling by offering an interpretable and structured approach to recurrent memory in transformers.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper