Score: 1

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication

Published: April 8, 2025 | arXiv ID: 2504.06408v1

By: Thomas McFarland, Julian Bellavita, Giulia Guidi

Potential Business Impact:

Speeds up computer calculations for science and games.

Business Areas:

GPU Hardware

Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. Using the semiring abstraction, many algorithms can be formulated as SpGEMM, allowing redefinition of addition, multiplication, and numeric types. Today large input matrices require distributed memory parallelism to avoid disk I/O, and modern HPC machines with GPUs can greatly accelerate linear algebra computation. In this paper, we implement a GPU-based distributed-memory SpGEMM routine on top of the CombBLAS library. Our implementation achieves a speedup of over 2x compared to the CPU-only CombBLAS implementation and up to 3x compared to PETSc for large input matrices. Furthermore, we note that communication between processes can be optimized by either direct host-to-host or device-to-device communication, depending on the message size. To exploit this, we introduce a hybrid communication scheme that dynamically switches data paths depending on the message size, thus improving runtimes in communication-bound scenarios.

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs

Distributed, Parallel, and Cluster Computing

Makes computers solve hard math problems much faster.

12 Dec 2025 3

88%

Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

Distributed, Parallel, and Cluster Computing

Makes computers solve problems faster and use less power.

20 Aug 2025 0

88%

Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment

Distributed, Parallel, and Cluster Computing

Makes computers process big data much faster.

25 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

8 pages

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication

Speeds up computer calculations for science and games.

Technical Abstract

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs

Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment