We present Product of Experts (PoE) as a scalable local learning framework that replaces end-to-end backpropagation with per-stage detached cross-entropy losses projected through a shared output head. At 1.3B parameters on the ClimbMix pretraining corpus, clustered PoE (4 stages × 6 layers) produces a bounded architectural trade-off: a 6.52% BPB gap versus a matched backpropagation baseline (PoE: 0.720935, BP: 0.676788), in exchange for a family of inference-time capabilities that a standard BP-trained model cannot access without retraining or accuracy loss. The gap widens convexly through training (+4.32% at step 1k → +6.52% at step 26,430 final), with 31% of the widening concentrated in the final 6K warmdown steps. Combined with the non-compressing r=10 → r=20 budget response (6.0% → 6.52%), the evidence supports a structural-floor interpretation (H-S): the gap reflects a bounded architectural cost of local learning rather than a training-budget artifact. Architectural consequences released with this paper include: stage prefix pruning (4× compute reduction at 87.5% factual accuracy), WAND adaptive depth (1.82× wall-clock at 100% top-1 agreement), speculative decoding with zero added parameters (1.87× speedup at 88% acceptance), parallel stage composition (+2.4 logit margin via log-space expert combination), and post-hoc specialist stages via dual-head construction that preserves the base bit-identically (Δlogit = 0.0000 across 12 checkpoints). CORE benchmark results are task-polarized: PoE underperforms on rare-fact retrieval (Jeopardy −16.2pp, SQuAD −18.4pp, LAMBADA −15.0pp) but exceeds BP on commonsense reasoning (PIQA +5.0pp, CommonsenseQA +5.8pp) and algorithmic pattern recognition (BigBench CS Algorithms +11.4pp). Deployment positioning: datacenter quality-critical inference favors BP; on-device inference favors PoE's architectural elasticity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jaepil Jeong
Cognizant (United States)
Cognizant (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Jaepil Jeong (Sun,) studied this question.
synapsesocial.com/papers/69e867136e0dea528ddeb5eb — DOI: https://doi.org/10.5281/zenodo.19657385
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: