Characterizing GPU Energy Usage in Exascale-Ready Portable Science Applications
By: William F. Godoy , Oscar Hernandez , Paul R. C. Kent and more
Potential Business Impact:
Saves energy on supercomputers by using less precise numbers.
We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReXCastro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X GPUs using queries in NVML and rocm_smi_lib, respectively. We explore application-specific metrics to provide insights on energy vs. performance trade-offs. Our results suggest that mixed-precision energy savings range between 6-25% on QMCPACK and 45% on AMReX-Castro. Also, we found gaps in the AMD tooling used on Frontier GPUs that need to be understood, while query resolutions on NVML have little variability between 1 ms-1 s. Overall, application level knowledge is crucial to define energy-cost/science-benefit opportunities for the codesign of future supercomputer architectures in the post-Moore era.
Similar Papers
On the Challenges of Energy-Efficiency Analysis in HPC Systems: Evaluating Synthetic Benchmarks and Gromacs
Distributed, Parallel, and Cluster Computing
Makes computer energy use easier to measure.
Managing Multi Instance GPUs for High Throughput and Energy Savings
Distributed, Parallel, and Cluster Computing
Makes computer chips run much faster and better.
Exploration of Cryptocurrency Mining-Specific GPUs in AI Applications: A Case Study of CMP 170HX
Hardware Architecture
Reuses old computer parts for faster AI.