Beyond Single-GPU: Scaling PDLP to Distributed Multi-GPU Systems
By: Hongpei Li , Yicheng Huang , Huikang Liu and more
Potential Business Impact:
Solves huge math problems faster on many computers.
In this work, we present a distributed implementation of the Primal-Dual Hybrid Gradient (PDHG) algorithm for solving massive-scale linear programming (LP) problems. Although PDHG-based solvers have shown strong performance on single-node GPU architectures, their applicability to industrial-scale instances is often limited by GPU memory capacity and computational throughput. To overcome these challenges, we extend the PDHG framework to a distributed-memory setting via a practical two-dimensional grid partitioning of the constraint matrix, enabling scalable execution across multiple GPUs. Our implementation leverages the NCCL communication backend to efficiently synchronize primal-dual updates across devices. To improve load balance and computational efficiency, we introduce a block-wise random shuffling strategy combined with nonzero-aware data distribution, and further accelerate computation through fused CUDA kernels. By distributing both memory and computation, the proposed framework not only overcomes the single-GPU memory bottleneck but also achieves substantial speedups by exploiting multi-GPU parallelism with relatively low communication overhead. Extensive experiments on standard LP benchmarks, including MIPLIB and Hans' instances, as well as large-scale real-world datasets, show that our distributed implementation, built upon cuPDLPx, achieves strong scalability and high performance while preserving full FP64 numerical accuracy.
Similar Papers
From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem
Distributed, Parallel, and Cluster Computing
Computers solve hard math problems using less power.
From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem
Distributed, Parallel, and Cluster Computing
Computers solve tough problems using less power.
Anderson Accelerated Primal-Dual Hybrid Gradient for solving LP
Optimization and Control
Solves math problems much faster.