I present a systematic experimental program that applies thermodynamics, dynamical systems theory, and cosmological analogies to characterize the internal dynamics of Transformer-based large language models (LLMs). Through 33 experiments on Qwen2.5 models (0.5B and 1.5B parameters), I discover that: Attention functions as a contractive "gravitational" force with negative specific heat (dU/dT ≈ −18), a universal constant independent of model scale; Feed-Forward Networks contribute 67–73% of the total representational force, functioning as "dark energy"; The Lyapunov exponent is consistently negative (λ = −0.05), proving Transformers are stable attractors; Information exhibits anti-lensing (cos = −0.15), repelling from high-norm tokens; A thermodynamic firewall monitoring PR×T variance achieves AUC = 0.88 for hallucination detection; Dark energy suppression reveals a critical phase transition at β = 0.57. These findings are synthesized into the Standard Model of Transformers, a unified physical framework with direct engineering applications in hallucination detection, model compression, and adversarial robustness. Code: https://github.com/hafufu-stack/Standard-Model-of-Transformers Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hiroto Funasaki
Building similarity graph...
Analyzing shared references across papers
Loading...
Hiroto Funasaki (Thu,) studied this question.
synapsesocial.com/papers/6a2268d7763171746d5476ea — DOI: https://doi.org/10.5281/zenodo.20533786