Modern machine learning has achieved remarkable empirical success, yet its theoretical underpinnings remain incomplete. This paper develops a unified framework for understanding learning through the first principles of statistical physics. Within this view, a modelsloss functioncorresponds to anenergy landscape, thetraining processtostochastic dynamics, andgeneralizationto an entropy-driven preference forflat minimaand phase-transition behavior. Energy-Based Models (EBMs)are analyzed as a central case study. This study shows how their mathematical form arises naturally from thePrinciple of Maximum Entropy, and how their latent-variable structures can be interpreted as learning aneffective theoryof the data. The learning dynamics are also given a direct physical interpretation, with training algorithms like Contrastive Divergence representingnon-equilibrium relaxation processes. In particular, we highlight that the Hessian-dependent gradient noise in Stochastic Gradient Descent emerges as the key mechanism driving anexponential preferencefor the flat minima associated with good generalization. Beyond explanation, this framework offers constructive principles.Symmetry, conservation laws, and dualityprovide powerful inductive biases for building data-efficient and interpretable models, as exemplified by Physics-Informed Neural Networks and compositional generative architectures. We conclude by pointing toward future research directions, including exploring the limits of compositionality through the lens of spin glass theory and developing a non-equilibrium thermodynamic framework for learning dynamics.
Xiaochen Liu (Tue,) studied this question.