Abstract We investigate the generalization properties of over-parameterized, two-layer neural networks in the so-called ‘lazy training’ regime, where weight updates remain small around their initial values. Using a student-teacher framework, we focus on the interplay between random features and first-layer linearization in determining the minimal achievable test error. Our analysis uses tools from statistical mechanics to study a high-dimensional limit in which the numbers of input features and hidden units both tend to infinity with a finite ratio K / N . We find that the random-feature contribution to the student’s output is effectively suppressed when the first-layer weights are also trained, yielding a finite plateau in the generalization error. By explicitly linearizing in the changes of hidden-unit weights, we derive a closed-form expression for this asymptotic error plateau. Numerical simulations confirm our analytical predictions, showing that the training dynamics converge to a small, finite generalization error that does not vanish, even as K , N → ∞ . These findings illustrate how training the first-layer weights modifies the random feature model results.
Worschech et al. (Mon,) studied this question.