Grokking—the phenomenon where neural networks suddenly generalize after prolonged overfitting—has accumulated multiple theoretical explanations since its discovery in 2022: Goldilocks Zone, Softmax Collapse, Lazy-Rich transition, etc. This paper reviews these theories and identifies their common blind spot: most focus on external measurements, lacking direct characterization of representation space geometry. Among them, the Goldilocks Zone theory touches on the "physical laws" of high-dimensional space and carries substantial theoretical value. We propose a unified framework—the Manifold Discovery Hypothesis: memorization is a high-dimensional jagged curve passing through all training points, generalization is discovering the low-dimensional manifold on which data is distributed, and Grokking is the transition from the former to the latter (possibly accompanied by critical state oscillations). We provide evidence supporting this hypothesis on two experimental groups: modular addition and modular multiplication: we observed significant drops in effective dimensionality of representations (78→8 / 89→11 under PCA 95% threshold), order-of-magnitude changes in topological summaries, and emergence of cluster structures in dimensionality-reduced visualizations. Notably, the modular multiplication experiment discovered that the model learned quotient group structure (k mod 12) cosets, purity 99.4%), which prompted us to revise the hypothesis into a two-stage model: local manifold discovery → global gluing. In one sentence: high-dimensional curve → low-dimensional surface.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yanyan Jin
Lei Zhao
Building similarity graph...
Analyzing shared references across papers
Loading...
Jin et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6980fbbec1c9540dea80d811 — DOI: https://doi.org/10.5281/zenodo.18416965
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: