AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
By: Sasindu Wijeratne, Rajgopal Kannan, Viktor Prasanna
Potential Business Impact:
Speeds up computer analysis of huge, messy data.
Matricized Tensor Times Khatri-Rao Product (MTTKRP) is the computational bottleneck in sparse tensor decomposition. As real-world sparse tensors grow to billions of nonzeros, they increasingly demand higher memory capacity and compute throughput from hardware accelerators. In this work, we present AMPED, a multi-GPU parallel algorithm designed to accelerate MTTKRP on billion-scale sparse tensors. AMPED scales beyond the limits of a single GPU, meeting both the memory and performance requirements of large-scale workloads. We introduce a partitioning strategy combined with a dynamic load balancing scheme to distribute computation and minimize GPU idle time. On real-world billion-scale tensors, AMPED achieves a 5.1x geometric mean speedup in total execution time over state-of-the-art GPU baselines using 4 GPUs on a single CPU node.
Similar Papers
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
Distributed, Parallel, and Cluster Computing
Speeds up computer analysis of huge, messy data.
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
Distributed, Parallel, and Cluster Computing
Makes computers analyze big data much faster.
A Performance Portable Matrix Free Dense MTTKRP in GenTen
Mathematical Software
Makes computers find patterns in data faster.