Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension
By: Hongguang Chen
Potential Business Impact:
Speeds up computer weather forecasts and saves energy.
Stencil computation is essential in high-performance computing, especially for large-scale tasks like liquid simulation and weather forecasting. Optimizing its performance can reduce both energy consumption and computation time, which is critical in disaster prediction. This paper explores optimization techniques for 7-point 3D stencil computation on ARM's Scalable Vector Extension (SVE), using the Roofline model and tools like Gem5 and cacti. We evaluate software optimizations such as vectorization and tiling, as well as hardware adjustments in ARM SVE vector lengths and cache configurations. The study also examines performance, power consumption, and chip area trade-offs to identify optimal configurations for ARM-based systems.
Similar Papers
Improving compiler support for SIMD offload using Arm Streaming SVE
Programming Languages
Helps computers use special chips for faster math.
ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace
Distributed, Parallel, and Cluster Computing
Makes computers run much faster using special instructions.
oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science
Distributed, Parallel, and Cluster Computing
Makes computers learn faster on new chips.