Parallel FFTW on RISC-V: A Comparative Study including OpenMP, MPI, and HPX
By: Alexander Strack, Christopher Taylor, Dirk Pflüger
Potential Business Impact:
Makes many-core chips work faster for science.
Rapid advancements in RISC-V hardware development shift the focus from low-level optimizations to higher-level parallelization. Recent RISC-V processors, such as the SOPHON SG2042, have 64 cores. RISC-V processors with core counts comparable to the SG2042, make efficient parallelization as crucial for RISC-V as the more established processors such as x86-64. In this work, we evaluate the parallel scaling of the widely used FFTW library on RISC-V for MPI and OpenMP. We compare it to a 64-core AMD EPYC 7742 CPU side by side for different types of FFTW planning. Additionally, we investigate the effect of memory optimization on RISC-V in HPX-FFT, a parallel FFT library based on the asynchronous many-task runtime HPX using an FFTW backend. We generally observe a performance delta between the x86-64 and RISC-V chips of factor eight for double-precision 2D FFT. Effective memory optimizations in HPX-FFT on x86-64 do not translate to the RISC-V chip. FFTW with MPI shows good scaling up to 64 cores on x86-64 and RISC-V regardless of planning. In contrast, FFTW with OpenMP requires measured planning on both architectures to achieve good scaling up to 64 cores. The results of our study mark an early step on the journey to large-scale parallel applications running on RISC-V.
Similar Papers
A HPX Communication Benchmark: Distributed FFT using Collectives
Distributed, Parallel, and Cluster Computing
Makes computer programs run 3x faster.
Is RISC-V ready for High Performance Computing? An evaluation of the Sophon SG2044
Distributed, Parallel, and Cluster Computing
New computer chip makes supercomputers much faster.
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
Distributed, Parallel, and Cluster Computing
Makes computer chips work much faster.