We introduce the spectral slope S(ℓ)—the log-linear decay rate of PCA eigenvalues computed from hidden-state representations at layer ℓ—as a cheap, per-layer diagnostic scalar for Transformer geometry. Across four rounds of experiments on 13 models from 5 architecture families (0.6B–30B parameters, dense and MoE, with varying RL intensity and modality count), we find that (1) per-layer spectral expansion ΔS/L decays monotonically with log N within the Qwen3 family (r=−0.968, p=0.007); (2) output-layer participation ratio PR tracks RL training intensity from 13.3 (base) to 4.3 (extreme RL); (3) chain-of-thought reasoning reverses RL-induced compression at runtime; (4) MoE routing increases aggregate spectral diversity; (5) post-hoc multimodal alignment consumes mid-layer spectral budget; and (6) real-time monitoring during DPO and GRPO training reveals that PR collapse is universal across 3 model sizes and 2 RL methods, with a characteristic danger window at step ~100 invisible to loss and reward curves. Code and toolkit: https://github.com/HenryZ838978/spectral-flow-probe
jing zhang (Wed,) studied this question.