ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace
By: Ruimin Shi , Gabin Schieffer , Maya Gokhale and more
Potential Business Impact:
Makes computers run much faster using special instructions.
Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD. This work provides a first study of the maturity and readiness of exploiting ARM and SVE in HPC. Using selected performance hardware events on the ARM Grace processor and analytical models, we derive new metrics to quantify the effectiveness of exploiting SVE vectorization to reduce executed instructions and improve performance speedup. We further propose an adapted roofline model that combines vector length and data elements to identify potential performance bottlenecks. Finally, we propose a decision tree for classifying the SVE-boosted performance in applications.
Similar Papers
Improving compiler support for SIMD offload using Arm Streaming SVE
Programming Languages
Helps computers use special chips for faster math.
Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension
Performance
Speeds up computer weather forecasts and saves energy.
oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science
Distributed, Parallel, and Cluster Computing
Makes computers learn faster on new chips.