We present Product of Experts (PoE) as a scalable local learning framework that replaces end-to-end backpropagation with per-stage detached cross-entropy losses projected through a shared output head. At 1.3B parameters on the ClimbMix pretraining corpus, clustered PoE (4 stages × 6 layers) produces a bounded architectural trade-off: a 6.52% BPB gap versus a matched backpropagation baseline (PoE: 0.720935, BP: 0.676788), in exchange for a family of inference-time capabilities that a standard BP-trained model cannot access without retraining or accuracy loss. The gap widens convexly through training (+4.32% at step 1k → +6.52% at step 26,430 final), with 31% of the widening concentrated in the final 6K warmdown steps. Combined with the non-compressing r=10 → r=20 budget response (6.0% → 6.52%), the evidence supports a structural-floor interpretation (H-S): the gap reflects a bounded architectural cost of local learning rather than a training-budget artifact. Architectural consequences released with this paper include: stage prefix pruning (4× compute reduction at 87.5% factual accuracy), WAND adaptive depth (1.82× wall-clock at 100% top-1 agreement), speculative decoding with zero added parameters (1.87× speedup at 88% acceptance), parallel stage composition (+2.4 logit margin via log-space expert combination), and post-hoc specialist stages via dual-head construction that preserves the base bit-identically (Δlogit = 0.0000 across 12 checkpoints). CORE benchmark results are task-polarized: PoE underperforms on rare-fact retrieval (Jeopardy −16.2pp, SQuAD −18.4pp, LAMBADA −15.0pp) but exceeds BP on commonsense reasoning (PIQA +5.0pp, CommonsenseQA +5.8pp) and algorithmic pattern recognition (BigBench CS Algorithms +11.4pp). Deployment positioning: datacenter quality-critical inference favors BP; on-device inference favors PoE's architectural elasticity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jaepil Jeong (Sun,) studied this question.
synapsesocial.com/papers/69e867136e0dea528ddeb5eb — DOI: https://doi.org/10.5281/zenodo.19657385
Jaepil Jeong
Cognizant (United States)
Cognizant (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...