Score: 1

Performance-Portable Optimization and Analysis of Multiple Right-Hand Sides in a Lattice QCD Solver

Published: January 9, 2026 | arXiv ID: 2601.05816v1

By: Shiting Long , Gustavo Ramirez-Hidalgo , Stepan Nassyr and more

Potential Business Impact:

Makes science computers solve problems much faster.

Business Areas:

Quantum Computing Science and Engineering

Managing the high computational cost of iterative solvers for sparse linear systems is a known challenge in scientific computing. Moreover, scientific applications often face memory bandwidth constraints, making it critical to optimize data locality and enhance the efficiency of data transport. We extend the lattice QCD solver DD-$α$AMG to incorporate multiple right-hand sides (rhs) for both the Wilson-Dirac operator evaluation and the GMRES solver, with and without odd-even preconditioning. To optimize auto-vectorization, we introduce a flexible interface that supports various data layouts and implement a new data layout for better SIMD utilization. We evaluate our optimizations on both x86 and Arm clusters, demonstrating performance portability with similar speedups. A key contribution of this work is the performance analysis of our optimizations, which reveals the complexity introduced by architectural constraints and compiler behavior. Additionally, we explore different implementations leveraging a new matrix instruction set for Arm called SME and provide an early assessment of its potential benefits.