Large-context language models are fundamentally bottlenecked by the resident memory required to store attention keys and values, as the standard transformer key-value (KV) cache grows linearly with context length. This paper introduces a mathematically explicit alternative: a constant-RAM memory architecture that decouples memory requirements from processed sequence length. Long-range continuity is maintained by a selective state-space core where the recurrent state is partitioned into stability-indexed, mixed-precision banks. The quantization tolerance of this non-normal spectral memory operator is governed by its distance from the unit circle, Jordan depth, and pseudospectral fragility. To handle rare, sharp episodic discontinuities without a full historical cache, the architecture employs a Fisher-Rao event gate that triggers a bounded attention capsule. Key formal contributions include: A Spectral-Bit Allocation Theorem: Proving that precision must scale with both inverse spectral margin and Jordan depth. Pseudospectral Bank Certificates: Protecting against non-normal transient amplification under low-bit integer arithmetic. A Fisher-Minimal Event Gate: Triggering local attention only when predictive distributions shift significantly in information geometry. A Renormalized Memory Semigroup: Enabling hierarchical compression of older states into slower predictive coordinates. Under explicit stability, Lipschitz, and coercivity assumptions, this framework provides bounded recurrent state norms, controlled bankwise quantization errors, and sparse attention correction. The result is a theoretical guarantee of O(1)resident memory complexity as context length approaches infinity, offering a rigorous blueprint for infinite-context AI on consumer-grade hardware.
Prithvidev Kamboj (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: