What question did this study set out to answer?

The research aims to understand long-context degradation in transformer-based language models using information geometry and extreme value theory.

April 16, 2026Open Access

Information-Geometric Context Window Governance and the Probabilistic Theory of Long-Context Collapse in Large Language Models

Key Points

The research aims to understand long-context degradation in transformer-based language models using information geometry and extreme value theory.
Developed a theoretical framework combining information geometry and extreme value theory.
Defined observer entropy using Kullback–Leibler divergence for coarse-graining.
Analyzed attention collapse through weakly dependent logit maxima.
Formulated a control-theoretic response with a phase-aware governor for context management.
Established a quadratic scaling law for observer entropy governed by the Fisher information matrix.
Showed that observer entropy vanishes in long-context limits, indicating information retention is impossible.
Derived bounds linking observer entropy with signal strength and context limit.

Abstract

This work develops a unified theoretical framework for long-context degradation in transformer-based large language models, combining information geometry and extreme value theory. The central quantity is the observer entropy Sₒbs (p_θ, ε) defined via Kullback–Leibler divergence under coarse-graining. The main result (Bridge Theorem) establishes the quadratic scaling law Sₒbs = ½ ε² v (θ) ^⊤ I (θ) v (θ) + O (ε³), showing that information loss is governed at leading order by the Fisher information matrix. On the probabilistic side, attention collapse is analysed using extreme value theory for weakly dependent logit maxima. This yields a closed-form probabilistic risk law and leads to a Fundamental Impossibility Theorem: for any finite signal strength, observer entropy vanishes in the long-context limit, implying that full information retention is impossible under softmax attention. These results are connected through bounds of the form Sₒbs (L) ≤ c₂ e^μL/L, where μL = μₛ − σ√ (2 log Lₑff), providing a unified information-theoretic characterization of long-context collapse. A control-theoretic response is formulated via the CPL 4. 0 phase-aware governor, which enforces a hard context cap, guarantees entropy contraction, and achieves sub-linear fragmentation bounds. The paper includes formal statements, proofs (complete or conditional where explicitly stated), numerical verification, and a proposed experimental protocol for validation on real LLM systems. Note: This Zenodo version omits non-scientific funding information present in the public DPID release. The scientific content is identical.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper