What question did this study set out to answer?

June 15, 2026Open Access

The Standard Model of Transformers: Five Universal Laws of Thermodynamic Computation in Large Language Models

Key Points

This research aims to define universal laws governing the internal dynamics of Transformer-based large language models using principles of thermodynamics.
Conducted 84 experiments on three architectures: Qwen2.5-1.5B, Qwen2.5-0.5B, and TinyLlama-1.1B.
Analyzed thermodynamic principles including Boltzmann distribution and negative specific heat to characterize model behavior.
Tested the ergodic hypothesis for structural and semantic variables.
Discovered five universal laws governing Transformer computation, including a Boltzmann distribution with R² = 0.978.
Confirmed negative specific heat across all models with p < 0.001, indicating energy increases with decreased temperature.
Established Carnot efficiency at η = 0.813 ± 0.036, showing stable thermodynamic performance across architectures.

Abstract

I present a systematic experimental program applying thermodynamics, dynamical systems theory, and cosmological analogies to characterize the internal dynamics of Transformer-based large language models (LLMs). Through 84 experiments across three architectures (Qwen2.5-1.5B, Qwen2.5-0.5B, TinyLlama-1.1B), I discover five universal laws governing Transformer computation: Boltzmann Distribution Law: Hidden-state activations follow p(E) ∝ exp(−E/kT) with R² = 0.978 across all architectures (CV = 0.001). Negative Specific Heat: All models exhibit Cv < 0 (p < 0.001), meaning energy increases as temperature decreases—the hallmark of self-gravitating systems. Inverse Radiation Law: Luminosity scales as L ∝ Tn with n = −1.44 ± 0.42, the exact opposite of the Stefan-Boltzmann law (n = 4). Carnot Efficiency Constant: The thermodynamic efficiency η = 0.813 ± 0.036 (CV = 0.044) is the tightest universal constant discovered, stable across architectures. Information Concentration Law: Free energy increases through layers—LLMs are "information refrigerators" that violate the Free Energy Principle, concentrating rather than dissipating information. Additional findings include: (a) FFN layers contribute 67–73% of representational force ("dark energy"), with a critical phase transition at βc ≈ 0.57; (b) iterative token feeding causes "black hole collapse" (T → 0 singularity); (c) the ergodic hypothesis holds for structural variables (participation ratio) but fails for semantic variables (temperature), revealing a fundamental distinction between structural and semantic degrees of freedom. These findings are synthesized into the Standard Model of Transformers, a unified physical framework with direct engineering applications in hallucination detection, model compression, and adversarial robustness. Changes from V1: Expanded from 33 to 84 experiments; added cross-architecture universality validation (3 models); established 5 universal laws with coefficient of variation analysis; tested ergodic hypothesis; added 12 publication-quality figures; increased from 10 to 17 pages. Code: https://github.com/hafufu-stack/Standard-Model-of-Transformers Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper