Score: 0

ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace

Published: May 14, 2025 | arXiv ID: 2505.09462v1

By: Ruimin Shi , Gabin Schieffer , Maya Gokhale and more

Potential Business Impact:

Makes computers run much faster using special instructions.

Business Areas:
GPU Hardware

Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD. This work provides a first study of the maturity and readiness of exploiting ARM and SVE in HPC. Using selected performance hardware events on the ARM Grace processor and analytical models, we derive new metrics to quantify the effectiveness of exploiting SVE vectorization to reduce executed instructions and improve performance speedup. We further propose an adapted roofline model that combines vector length and data elements to identify potential performance bottlenecks. Finally, we propose a decision tree for classifying the SVE-boosted performance in applications.

Page Count
15 pages

Category
Computer Science:
Distributed, Parallel, and Cluster Computing