Empirically-Calibrated H100 Node Power Models for Reducing Uncertainty in AI Training Energy Estimation
By: Alex C. Newkirk , Jared Fernandez , Jonathan Koomey and more
Potential Business Impact:
AI computers use less power than expected.
As AI's energy demand continues to grow, it is critical to enhance the understanding of characteristics of this demand, to improve grid infrastructure planning and environmental assessment. By combining empirical measurements from Brookhaven National Laboratory during AI training on 8-GPU H100 systems with open-source benchmarking data, we develop statistical models relating computational intensity to node-level power consumption. We measure the gap between manufacturer-rated thermal design power (TDP) and actual power demand during AI training. Our analysis reveals that even computationally intensive workloads operate at only 76% of the 10.2 kW TDP rating. Our architecture-specific model, calibrated to floating-point operations, predicts energy consumption with 11.4% mean absolute percentage error, significantly outperforming TDP-based approaches (27-37% error). We identified distinct power signatures between transformer and CNN architectures, with transformers showing characteristic fluctuations that may impact grid stability.
Similar Papers
Power Stabilization for AI Training Datacenters
Hardware Architecture
Smooths out big computer power spikes.
Power Stabilization for AI Training Datacenters
Hardware Architecture
Keeps AI training from breaking power grids.
Small is Sufficient: Reducing the World AI Energy Consumption Through Model Selection
Computers and Society
Saves energy by using smaller AI models.