Key points are not available for this paper at this time.
Delayed generalization, termed grokking, in a machine learning calculation occurs when the increase in test accuracy is delayed relative to the training accuracy. This paper examines grokking in the context of a dense neural network trained to classify 2D Ising model configurations into 4 equally spaced energy regions in the presence of weight decay. Partially with the aid of novel PCA-based network layer analysis techniques, the observed behavior is interpreted as a transition from a connected network to a group of sparse subnetworks in which the number of active weights in each layer decreases monotonically with depth. This architecture reduces classification errors resulting from a multiplicity of paths. The final network layers, as in a convolutional neural network, sequentially identify global features of the input classes, which enables generalization to previously unseen patterns.
Building similarity graph...
Analyzing shared references across papers
Loading...
Karolina Hutchison
David Yevick
Physica A Statistical Mechanics and its Applications
University of Waterloo
Building similarity graph...
Analyzing shared references across papers
Loading...
Hutchison et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a05659da550a87e60a1df69 — DOI: https://doi.org/10.1016/j.physa.2026.131659