Mixture‑of‑Experts (MoE) architectures offer a practical route to scaling neural networks by activating only a small subset of experts per token, yet their deployment is often hindered by a recurring instability in which routing decisions drift toward a narrow set of experts, culminating in what is commonly termed expert collapse. Although widely reported, the precursor patterns leading to collapse remain poorly characterised, and existing mitigation strategies provide limited insight into how the phenomenon emerges. This paper introduces a lightweight diagnostic framework for analysing the evolving structure of routing behaviour during training. We propose three complementary metrics -Expert Load Entropy ELE(t), Routing Diversity Index RDI(t), and Expert Centrality Drift ECD(t) - that together capture tendencies toward concentration, reduced flexibility, and temporal volatility. Drawing on these observations, we outline a small family of stabilisation regularisers designed to act gently when the system begins to drift into less stable regimes. The framework is architecture‑agnostic, requires minimal instrumentation, and aims to provide practitioners with a clearer vocabulary for interpreting routing dynamics. While empirical validation at scale remains future work, the diagnostic perspective offered here may help situate collapse within a broader class of structural transitions, faintly reminiscent of precursor signatures in nonequilibrium systems Prigogine & Nicolis, 1977.
Aure Ecker-Fils (Sat,) studied this question.