What question did this study set out to answer?

April 21, 2026Open Access

Cross-Family Convergence of Neural Network Weight Skeletons: An Empirical Study via Weibull Shape Parameter

Key Points

The study aims to quantify the structural connectivity of neural network weight skeletons using the Weibull shape parameter.
Analyzed weight absolute-value distributions of 10 models across 5 families.
Utilized Weibull k distributions to predict percolation behavior.
Monitored training dynamics through NPM-dk and its second derivative.
Models converged to a terminal k value between 1.13 and 1.19, below the Gaussian baseline.
Convergence properties were consistent across different training conditions and model sizes.
NPM-dk exhibited a three-phase structure aligned with key training milestones.

Abstract

What regularities does the weight skeleton of a neural network exhibit during training? Following the framework of the Neural Percolation Model (NPM), and drawing on the established use of Weibull k distributions for predicting percolation breakthrough in porous media, this paper quantifies the structural connectivity of neural network weight skeletons via the Weibull shape parameter k of the weight absolute-value distribution. We find that after sufficient training, 10 models from 5 independent families (Pythia, OLMo, Qwen, Mistral, LLaMA-3) all converge to terminal k within a narrow band 1.13, 1.19, lying 2–7% below the Gaussian baseline (k = 1.205). This phenomenon holds across different initialization strategies, optimizers, training datasets, and parameter counts spanning 100× (70M to 8B). Body-vs-tail ablation across 7 of these models confirms that the convergence is a property of the central 80–90% of the weight distribution and is masked when the full 100% is fitted. We further propose NPM-dk (the rate of change of k) as a training-dynamics monitor that exhibits a three-phase structure (skeleton formation → anchor → bifurcation), whose timescales align with the break-even point and the rewinding point. The complete three-phase structure is verified across 4 Pythia scales with dense early-step coverage (70m / 160m / 410m / 1B); Pythia-2.8B's three independently-retrievable checkpoints (step 40k/80k/143k) additionally fall within the predicted Phase 2/3 window, with Phase 2/3 amplitudes decreasing with depth — consistent with deeper models exhibiting more stable anchors. We also present exploratory observations of NPM-d²k (the second derivative of k) across four Pythia scales, noting its potential for future characterization of training dynamics. Keywords: weight skeleton, percolation criticality, training dynamics, Weibull distribution, overparameterization

Cross-Family Convergence of Neural Network Weight Skeletons: An Empirical Study via Weibull Shape Parameter

Key Points

Abstract

Cite This Study