We show that the same angular non-uniformity of L2-normalized token embeddings that enables TurboQuant's extreme data compression also enables sublinear routing computation in transformer-style architectures. A fixed Hopf fibration map exploits this structure to produce a routing footprint scaling as K⁰. 572 vs K¹. 0 for dense routing — an advantage that persists at K=5000 (ratio 2. 6–2. 8×). In a 2-layer trainable language model, fixed geometric routing replaces a learned top-1 gate with only 8% validation perplexity cost and no learned gate matrix, while using 46 of 64 effective expert paths at convergence (1. 4× more efficient than dense routing). A second-dataset replication on WikiText-2 (confirmed 2 seeds) finds a HOPF/BASELINE ratio of 1. 081 — numerically identical to the PTB confirmed ratio — under identical training conditions. This result is scoped to the 2-layer toy-scale trainable setting and should not be read as a claim of broad MoE replacement or large-scale transformer substitution. Taken together with TurboQuant, this work suggests the angular non-uniformity of embeddings has engineering consequences in both data compression and routing computation.
Casey Allard (Thu,) studied this question.