What question did this study set out to answer?

May 11, 2026Open Access

The Basin Is the Trajectory: The Loop Closure Principle in nanoGPT

Key Points

The aim is to isolate the causal role of recursive token re-injection in autoregressive generation and its impact on structure dynamics.
Applied a controlled measurement framework to a bilingual character-level transformer.
Held components, weights, seeds, and distributions fixed while manipulating token re-injection.
Conducted partial-feedback experiments to observe dynamics at varying reinjection rates and lags.
With the feedback loop severed, divergence failed to accumulate (JSD slope at zero).
Recursive fidelity dropped significantly from 0.86±0.06 to 0.49±0.04 when loop closed, while static fidelity remained stable.
Divergence rate showed a positive increase (0.691±0.021 nats/char) when the loop was active, illustrating the necessity of the feedback mechanism.

Abstract

We introduce a controlled measurement framework for isolating the causal role of recursive token re-injection in autoregressive generation. Applied to a bilingual character-level transformer, the framework asks whether the recursive feedback loop is causally necessary for the trajectory-level structure observed in generation. The intervention holds components, weights, seeds, and per-step distributions fixed, removing only token re-injection. With the loop severed, per-step divergence fails to accumulate (the JSD slope sits at machine zero) and regime structure collapses. Close the loop and persistent regime-switching dynamics return, with a positive divergence rate (0.691 ± 0.021 nats/char across 4 seeds × 50 rollouts each; across-seed std, per-seed means from sanity p=1 conditions; see Table 11) at every tested temperature. The same intervention inside 4 jointly trained bilingual variants (MESPT, seeds 42–45) replicates the contrast quantitatively: recursive fidelity drops 0.86±0.06 → 0.49±0.04 while static fidelity is preserved (0.87 ± 0.02 → 0.88 ± 0.03). The basin label commits over the first ∼1200 characters of recursive generation (mean commitment time 24.4 windows), not earlier. It does not separate under geometric clustering at any layer across all seeds tested. Four independent methods, including an endogenous decomposition that requires no reference models, agree on the same partition. Partial-feedback experiments show that any nonzero reinjection produces dynamics; steady-state divergence is invariant to feedback rate (three orders of magnitude) and to reinjection lag (up to 100 steps). Content invariance holds at the level of plateau existence – both uniform-random and marginal-sampled feedback sustain positive Drate – but the plateau height is content-dependent (∼35% spread across real, uniform, and marginal feedback). A same-language negative control (mixture of two MES variants at different random seeds) yields Drate 22× smaller than the bilingual baseline, confirming that the mechanism requires distinguishable components. The theory’s sufficient conditions for exact metastability do not hold in this setting. A mechanism consistent with the theory operates despite failure of its sufficient conditions. These results identify the Loop Closure Principle (empirical invariance class) in nanoGPT: recursive feedback is the necessary enabler that converts per-step component disagreement into trajectory-level divergence; once active (p > 0, distinguishable components), steadystate divergence exhibits empirical invariances over the tested ranges of rate, content, and timing: an empirically established property of recursive autoregressive composition in this controlled construction. The structural independence of this property from the standard sufficient conditions is verified formally in a synthetic finite-discrete recursive system in Appendix C; this is a logical-independence result in a designed model, not a theorem about transformers or any continuous process. The Principle is established within a controlled autoregressive mixture; its applicability beyond this regime remains an empirical question.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper