Low-Level and NUMA-Aware Optimization for High-Performance Quantum Simulation
By: Ali Rezaei , Luc Jaulmes , Maria Bahna and more
Potential Business Impact:
Makes computers simulate quantum computers faster.
Scalable classical simulation of quantum circuits is crucial for advancing both quantum algorithm development and hardware validation. In this work, we focus on performance enhancements through meticulous low-level tuning on a single-node system, thereby not only advancing the performance of classical quantum simulations but also laying the groundwork for scalable, heterogeneous implementations that may eventually bridge the gap toward noiseless quantum computing. Although similar efforts in low-level tuning have been reported in the literature, such implementations have not been released as open-source software, thereby impeding independent evaluation and further development. We introduce an open-source, high-performance extension to the QuEST simulator that brings state-of-the-art low-level and NUMA optimizations to modern computers. Our approach emphasizes locality-aware computation and incorporates hardware-specific optimizations such as NUMA-aware memory allocation, thread pinning, AVX-512 vectorization, aggressive loop unrolling, and explicit memory prefetching. Experiments demonstrate significant speedups - 5.5-6.5x for single-qubit gate operations, 4.5x for two-qubit gates, 4x for Random Quantum Circuits (RQC), and 1.8x for Quantum Fourier Transform (QFT), demonstrating that rigorous performance tuning can substantially extend the practical simulation capacity of classical quantum simulators on current hardware.
Similar Papers
Quantum Approximate Optimization Algorithm: Performance on Simulators and Quantum Hardware
Quantum Physics
Makes quantum computers work better despite errors.
Scalable Memory Recycling for Large Quantum Programs
Quantum Physics
Makes quantum computers run faster and use less memory.
TQml Simulator: Optimized Simulation of Quantum Machine Learning
Quantum Physics
Makes quantum computers learn and work faster.