Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs
By: Melanie Cornelius , Greg Cross , Shilpika Shilpika and more
Potential Business Impact:
Saves computer energy by watching how it works.
As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads. This work addresses this challenge by presenting a data co-analysis approach using system data collected from the Polaris supercomputer at Argonne National Laboratory. We focus on GPU utilization and power demands, navigating the complexities of large-scale, heterogeneous datasets. Our approach, which incorporates data preprocessing, post-processing, and statistical methods, condenses the data volume by 94% while preserving essential insights. Through this analysis, we uncover key opportunities for power optimization, such as reducing high idle power costs, applying power strategies at the job-level, and aligning GPU power allocation with workload demands. Our findings provide actionable insights for energy-efficient computing and offer a practical, reproducible approach for applying existing research to optimize system performance.
Similar Papers
Power-Capping Metric Evaluation for Improving Energy Efficiency in HPC Applications
Distributed, Parallel, and Cluster Computing
Saves computer energy for faster science.
Energy Efficiency trends in HPC: what high-energy and astrophysicists need to know
Distributed, Parallel, and Cluster Computing
Makes supercomputers use less power for faster results.
Longitudinal Analysis of GPU Workloads on Perlmutter
Distributed, Parallel, and Cluster Computing
Finds ways to make supercomputers run faster.