Score: 0

ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace

Published: May 14, 2025 | arXiv ID: 2505.09462v1

By: Ruimin Shi , Gabin Schieffer , Maya Gokhale and more

Potential Business Impact:

Makes computers run much faster using special instructions.

Business Areas:

GPU Hardware

Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD. This work provides a first study of the maturity and readiness of exploiting ARM and SVE in HPC. Using selected performance hardware events on the ARM Grace processor and analytical models, we derive new metrics to quantify the effectiveness of exploiting SVE vectorization to reduce executed instructions and improve performance speedup. We further propose an adapted roofline model that combines vector length and data elements to identify potential performance bottlenecks. Finally, we propose a decision tree for classifying the SVE-boosted performance in applications.

Improving compiler support for SIMD offload using Arm Streaming SVE

Programming Languages

Helps computers use special chips for faster math.

2 Jun 2025 0

87%

Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension

Performance

Speeds up computer weather forecasts and saves energy.

3 Mar 2025 0

87%

oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science

Distributed, Parallel, and Cluster Computing

Makes computers learn faster on new chips.

5 Apr 2025 0

View PDF Login to Bookmark

Page Count

15 pages

ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace

Makes computers run much faster using special instructions.

Technical Abstract

Improving compiler support for SIMD offload using Arm Streaming SVE

Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension

oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science