What question did this study set out to answer?

This research aims to estimate the per-query energy use of AI inference, providing a more accurate framework for energy consumption assessments.

June 11, 2026Open Access

Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute

Key Points

This research aims to estimate the per-query energy use of AI inference, providing a more accurate framework for energy consumption assessments.
Introduced a bottom-up framework for estimating energy from token throughput, node power, and overhead in large-scale settings.
Analyzed data from frontier-scale models (>200B parameters) on H100 nodes to derive energy estimates.
Estimated energy implications for serving 1 billion queries per day under various scenarios.
Estimated median energy use of 0.31 Wh/query (IQR 0.16–0.60), overstated by 4–20× in public estimates.
Median energy for long queries (15× scaling) rises to 3.91 Wh (IQR 2.15–7.05).
Potential line-of-sight energy reductions of 8–20× at data center scale.

Abstract

As AI inference scales to billions of queries, estimates of per-query energy use are increasingly important for capacity planning, efficiency interventions, and policy. Yet many public estimates assume non-production settings, leading to systematic overestimation. We introduce a bottom-up framework estimating inference energy from token throughput, node power, and overhead under large-scale deployment assumptions. For frontier-scale models (>200B parameters) on H100 nodes, we estimate a median energy of 0.31 Wh/query (IQR 0.16–0.60), indicating widely cited estimates are overstated by 4–20×. In test-time scaling scenarios 15× longer than typical queries, the median energy rises 13× to 3.91 Wh (IQR 2.15–7.05). Across models, serving systems, and hardware, we estimate 8–20× line-of-sight energy reductions. At datacenter scale, serving 1 billion queries/day requires 0.7 GWh; if 10% are long queries, demand rises to 1.7 GWh/day. With efficiency interventions, it falls to 0.8 GWh/day, mitigating the energy impact of test-time scaling. In this version we also include an estimate of water per query in hyperscalers. This repository is provided for research and informational purposes only and does not constitute legal, regulatory, compliance, or policy guidance. Results should be interpreted as assumption-dependent and directional, not as definitive measurements of AI energy or water use across all systems or as guaranteed efficiency outcomes. The analysis focuses on per-query inference energy and efficiency pathways only and is not a full environmental or lifecycle assessment.

Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute

Key Points

Abstract

Cite This Study