What question did this study set out to answer?

The goal is to establish the relationship between spectral gaps and exponential decay rates in transformer attention models.

April 3, 2026Open Access

Sasaki Bridge Theorem: Spectral Gap Unifies Lyapunov Convergence and Commutator Norm Decay in Transformer Attention, with a Commutativity Scaling Law

Key Points

The goal is to establish the relationship between spectral gaps and exponential decay rates in transformer attention models.
Demonstrated the Sasaki Bridge Theorem under spectral gap conditions.
Verified the theorem empirically in 27.3% of trials for GPT-2 Small.
Calculated the Commutativity Scaling Law across seven Transformer models.
Found exponential decay of the commutator norm at rate μ/2 based on spectral gap.
Established C(W) for the Commutativity Scaling Law with R² = 0.914.
Identified empirical invariant exponent b separating architecture classes.

Abstract

We present the Sasaki Bridge Theorem, which establishes that under positive spectral gap λgap > 0, the commutator norm ‖Xₕ (t), X*‖F in Transformer attention matrices decays exponentially at rate μ/2, where μ is the Lyapunov decay rate of a structural divergence functional L (Lemma 1–5, proven). The theorem is conditioned on Assumption A4 (dL/dt ≤ −μL), empirically verified in 27. 3% of trials for GPT-2 Small; a weaker condition A4' is confirmed in all tested layers. We further report the Commutativity Scaling Law (CSL): C (W) = 0. 39·W^−1. 14 (R² = 0. 914) across seven Transformer models, where W is the attention window width. The scaling exponent b constitutes an empirical invariant (CV = 5. 0% across GPT-2 sizes) separating architecture classes: b ≈ 1. 05 (causal decoders), b ≈ 0. 65 (alternating attention), b ≈ 0. 35 (bidirectional encoders). Note: the lower bound constant c₁ = 2λgap² is near-vacuous for typical Transformer weights (c₁ ≈ 1. 4×10⁻⁶ for GPT-2 Small) ; the upper bound is the practically informative direction. All main results rely on the upper bound only. We additionally present — explicitly as a conjecture, not a theorem — a three-phase model in which intelligence emergence corresponds to a grokking-type phase transition (Phase 1: transient commutator defect spike, supported by Xu et al. arXiv: 2602. 16967) followed by re-commutatization (Phase 2: governed by the main theorem). This conjecture is not formally proven within the present framework. Scope: The Sasaki Bridge Theorem describes the post-grokking stabilization phase (Phase 2). The pre-grokking commutator defect spike (Phase 1) is outside the theorem's current scope and is identified as primary future work.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper