Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment
By: Elaheh Hassani, Md Taufique Hussain, Ariful Azad
Potential Business Impact:
Makes computers process big data much faster.
We present scalable distributed-memory algorithms for sparse matrix permutation, extraction, and assignment. Our methods follow an Identify-Exchange-Build (IEB) strategy where each process identifies the local nonzeros to be sent, exchanges the required data, and then builds its local submatrix from the received elements. This approach reduces communication compared to SpGEMM-based methods in distributed memory. By employing synchronization-free multithreaded algorithms, we further accelerate local computations, achieving substantially better performance than existing libraries such as CombBLAS and PETSc. We design efficient software for these operations and evaluate their performance on two university clusters and the Perlmutter supercomputer. Our experiments span a variety of application scenarios, including matrix permutation for load balancing, matrix reordering, subgraph extraction, and streaming graph applications. In all cases, we compare our algorithms against CombBLAS, the most comprehensive distributed library for these operations, and, in some scenarios, against PETSc. Overall, this work provides a comprehensive study of algorithms, software implementations, experimental evaluations, and applications for sparse matrix permutation, extraction, and assignment.
Similar Papers
Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication
Distributed, Parallel, and Cluster Computing
Speeds up computer calculations for science and games.
Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
Distributed, Parallel, and Cluster Computing
Makes computers solve hard math problems much faster.
Efficient Parallel Scheduling for Sparse Triangular Solvers
Distributed, Parallel, and Cluster Computing
Makes computers solve math problems much faster.