Saad-Falcon et al. (2025) introduced Intelligence Per Watt (IPW) as the critical metric for tracking AI efficiency: task accuracy divided by power consumed. Their longitudinal study documents 5.3x IPW improvement from 2023-2025, driven by model and hardware advances. This paper demonstrates that a stack of algorithmic efficiency optimizations — derived from a unified stochastic health monitoring framework — provides an additional multiplicative IPW improvement on top of whatever hardware is available. The core three-layer algorithmic stack (FlashAttention, run-level power metric allocation, and early exit) provides a combined 23x IPW improvement at sequence length 4,096 tokens, through three orthogonal mechanisms: per-operation memory bandwidth efficiency (FlashAttention, 2.86x), allocation efficiency reducing which operations occur (power metric inference, 5.18x), and depth efficiency reducing how many layers each operation uses (early exit, 1.56x). These layers are independent and compound multiplicatively. Applied on top of Saad-Falcon et al.'s 2025 hardware baseline, the combined IPW improvement is estimated at up to approximately 122x versus the 2023 baseline. The full stack including speculative decoding and quality-preserving layers reaches 70x algorithmic improvement alone. Critically, these algorithmic gains are available today on existing hardware — they do not require waiting for the next hardware generation.
Cole Cantrell (Tue,) studied this question.