What regularities does the weight skeleton of a neural network exhibit during training? Following the framework of the Neural Percolation Model (NPM), and drawing on the established use of Weibull k distributions for predicting percolation breakthrough in porous media, this paper quantifies the structural connectivity of neural network weight skeletons via the Weibull shape parameter k of the weight absolute-value distribution. We find that after sufficient training, 10 models from 5 independent families (Pythia, OLMo, Qwen, Mistral, LLaMA-3) all converge to terminal k within a narrow band 1.13, 1.19, lying 2–7% below the Gaussian baseline (k = 1.205). This phenomenon holds across different initialization strategies, optimizers, training datasets, and parameter counts spanning 100× (70M to 8B). Body-vs-tail ablation across 7 of these models confirms that the convergence is a property of the central 80–90% of the weight distribution and is masked when the full 100% is fitted. We further propose NPM-dk (the rate of change of k) as a training-dynamics monitor that exhibits a three-phase structure (skeleton formation → anchor → bifurcation), whose timescales align with the break-even point and the rewinding point. The complete three-phase structure is verified across 4 Pythia scales with dense early-step coverage (70m / 160m / 410m / 1B); Pythia-2.8B's three independently-retrievable checkpoints (step 40k/80k/143k) additionally fall within the predicted Phase 2/3 window, with Phase 2/3 amplitudes decreasing with depth — consistent with deeper models exhibiting more stable anchors. We also present exploratory observations of NPM-d²k (the second derivative of k) across four Pythia scales, noting its potential for future characterization of training dynamics. Keywords: weight skeleton, percolation criticality, training dynamics, Weibull distribution, overparameterization
Tiexin Ding (Sun,) studied this question.