Key points are not available for this paper at this time.
We propose a unied framework that links two recent results from apparently disjoint elds:the v2/3 scaling of the excess work in nite-time rst-order phase transitions of the kinetic CurieWeiss model WCQ25, and the identication of the interpolation threshold in overparameterizedlinear regression as a non-equilibrium phase transition with broken ergodicity LG25. Weshow that both phenomena are instances of slow passage through a saddlenode bifurcation ina high-dimensional stochastic system whose deterministic mean-eld reduction admits a one-dimensional unstable manifold. We render this connection mathematically precise by derivinga canonical normal form du/ds = u2 + s in both settings via center-manifold reduction withadiabatic elimination of fast modes, recovering the Airy-function representation that controls thev−1/3 delay-time scaling and the v2/3 excess-work scaling. We then prove that the uctuationdissipation theorem (FDT) breakdown observed in the overparameterized regime of dynamicalmean-eld theory (DMFT) for stochastic gradient descent on linear regression is algebraicallyequivalent to the persistence of a 1/(iω) pole in the linear response of a Langevin systempossessing a zero-eigenvalue mode of the eective Hessian, with residue equal to the fractionalcodimension of the row space of the design matrix.The DMFT analysis extends from linear regression to deep neural networks in two regimes:the lazy / neural-tangent-kernel regime, where the framework applies layer-by-layer with aparameter-weighted layer-additive f-sum rule, and the feature-learning regime, where the orderparameter becomes time-dependent and the lazy-to-rich transition is itself an instance of theAiry normal form. We further extend the framework to three architectural ingredients of modernnetworks via dedicated propositions: multi-head attention (conditional on a closure hypothesiswe make explicit and later prove for ε-approximate closure), residual connections via a path-sumf-sum rule on the network's directed acyclic graph, and normalization layers via an equivariantorder parameter that quotients out the continuous symmetries introduced by LayerNorm andBatchNorm. A direct subsection reconciles the apparent dynamicstatic asymmetry between theCurieWeiss sweep and the linear-regression quenched-geometry settings by identifying trainingtime, not the capacity α, as the relevant temporal variable in the latter; the deep-networkfeature-learning case is shown to be the cleanest setting in which all three structural features(external control, codimension-1 singularity, nite traversal rate) are present.The framework admits a natural categorication: the order parameter ηerg = (α − 1)/α islifted to a functor valued in K-theory classes, the f-sum rule becomes an index pairing on a split-exact sequence in a response category Resp(α), the (N, v) phase diagram is realized as a Conleypersistence module, and the Airy normal form is identied as the universal 1-morphism througha saddlenode object in a 2-category GSP of generalized slow passages. The two structuralobstructions of this 2-category (codimension-≥ 2 degeneracies and extended zero-mode sub-manifolds from continuous loss-symmetries) are closed by extension to a stratied equivariant13-category GSP•,G, in which every codimension-k degeneracy with k ≤ 4 (the regime of theThomMather classication) embeds as a universal k-morphism, and continuous loss-symmetriesare absorbed via equivariant K-theory and the AtiyahSegal completion theorem. The numeri-cal content of the original framework is recovered as the decategorication.Four concrete predictions follow from this analytical and categorical structure: (i) thecrossover line in the (N, v) plane of the kinetic CurieWeiss model is logarithmic in v withslope determined by the cusp-catastrophe barrier exponent; (ii) the dynamical critical exponents(σ, ϕ) = (1, 2) asserted for ridgeless linear regression are consistent with the rst-order pole ofthe HastieMontanariRossetTibshirani static formula Has+22 when σ is correctly read asthe pole order and ϕ as the dynamical-time exponent, with Egen(t, α = 1) ∝√t at criticalityand an explicit closed-form scaling function; (iii) grokking in algorithmic neural networks shouldexhibit Airy-type delay statistics with τgrok ∼ v−1/3eff , sharpened by the categorication to thefull Airy distribution rather than the mean alone; and (iv) a secondary K-invariant computedfrom the categoried Chern character should distinguish overparameterized linear regression,grokking, and anisotropic-covariance regimes even when these share the same ηerg.We close with a section that catalogs the framework's open problemsDMFT closure forL ≥ 2, attention DMFT closure, the anisotropic K-theoretic obstruction, codimension ≥ 5 inthe categorical closure, the microscopic origin of broken ergodicity, and nite-n correctionsand attacks each with a proposed solution. Two are complete rigorous theorems (an explicitcritical exponent κc(α) = 2(α−1)/(α+1) for the anisotropic obstruction, and the universality ofthe response operator under rate-function interpolation), three are partial results (frozen-layerDMFT closure for L = 2, ε-DMFT closure for multi-head attention, and a BerryEsseen boundon nite-n corrections), and one is a research strategy (the derived 3-category DGSP•,G for
Building similarity graph...
Analyzing shared references across papers
Loading...
Alfredo Sepulveda-Jimenez
QED Labs
Building similarity graph...
Analyzing shared references across papers
Loading...
Alfredo Sepulveda-Jimenez (Sun,) studied this question.
www.synapsesocial.com/papers/6a1538ebb5d9c58d83e8ca5e — DOI: https://doi.org/10.5281/zenodo.20368045