This paper is a combination calculator, not a new algorithm. It quantifies the compound efficiency of two fully independent methods that address different layers of the inference compute stack and can be deployed simultaneously without modification to either. FlashAttention (Dao et al. , 2022/2023) is an exact attention algorithm that reorganizes memory access to reduce bandwidth consumption per operation by 15-78% depending on sequence length. It does not prune tokens, skip computation, or change what is computed — only how. The power metric framework (Cantrell 2026) is an allocation signal that operates at a completely different level: it identifies unproductive training runs or inference samples and stops them early, saving 21-43% of training compute (hypothesized, Paper 1) or 92. 7% of sampling compute (simulation, Paper 2). These two methods address distinct bottlenecks (memory bandwidth vs. allocation waste) and compound multiplicatively. At sequence length 4, 096 tokens, combined inference savings reach 97. 4% (39. 1x multiplier), combining FlashAttention (65%) with adaptive sampling reduction (92. 7%, Paper 2 simulation). Combined training savings reach 75. 2% (4. 0x), using the hypothesized 29% training reduction from Paper 1 signal analysis. At sequence length 16, 384 tokens, combined inference savings reach 98. 4% (62. 3x). These results are computed from published FlashAttention benchmarks (Dao et al. , 2022, 2023) and power metric results from prior work (Cantrell 2026). No new experiments are required: the combination formula is totalₛavings = 1 - (1 - FAₛavings) × (1 - PMₛavings), which follows from the independence of the two optimizations. We propose that FlashAttention and the power metric form a natural two-layer efficiency stack and suggest the Intelligence Per Watt metric (Mirhoseini et al. , 2025) as the unified measure of the combined improvement. Keywords: FlashAttention, memory bandwidth, IO-awareness, power metric, compute efficiency, two-layer stack, multiplicative savings, intelligence per watt, training efficiency
Cole Cantrell (Thu,) studied this question.