I present a systematic experimental program applying thermodynamics, dynamical systems theory, and cosmological analogies to characterize the internal dynamics of Transformer-based large language models (LLMs). Through 84 experiments across three architectures (Qwen2.5-1.5B, Qwen2.5-0.5B, TinyLlama-1.1B), I discover five universal laws governing Transformer computation: Boltzmann Distribution Law: Hidden-state activations follow p(E) ∝ exp(−E/kT) with R² = 0.978 across all architectures (CV = 0.001). Negative Specific Heat: All models exhibit Cv < 0 (p < 0.001), meaning energy increases as temperature decreases—the hallmark of self-gravitating systems. Inverse Radiation Law: Luminosity scales as L ∝ Tn with n = −1.44 ± 0.42, the exact opposite of the Stefan-Boltzmann law (n = 4). Carnot Efficiency Constant: The thermodynamic efficiency η = 0.813 ± 0.036 (CV = 0.044) is the tightest universal constant discovered, stable across architectures. Information Concentration Law: Free energy increases through layers—LLMs are "information refrigerators" that violate the Free Energy Principle, concentrating rather than dissipating information. Additional findings include: (a) FFN layers contribute 67–73% of representational force ("dark energy"), with a critical phase transition at βc ≈ 0.57; (b) iterative token feeding causes "black hole collapse" (T → 0 singularity); (c) the ergodic hypothesis holds for structural variables (participation ratio) but fails for semantic variables (temperature), revealing a fundamental distinction between structural and semantic degrees of freedom. These findings are synthesized into the Standard Model of Transformers, a unified physical framework with direct engineering applications in hallucination detection, model compression, and adversarial robustness. Changes from V1: Expanded from 33 to 84 experiments; added cross-architecture universality validation (3 models); established 5 universal laws with coefficient of variation analysis; tested ergodic hypothesis; added 12 publication-quality figures; increased from 10 to 17 pages. Code: https://github.com/hafufu-stack/Standard-Model-of-Transformers Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.
Hiroto Funasaki (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: