PIM-FW: Hardware-Software Co-Design of All-pairs Shortest Paths in DRAM
By: Tsung-Han Lu , Zheyu Li , Minxuan Zhou and more
Potential Business Impact:
Makes finding shortest paths much faster.
All-pairs shortest paths (APSP) is a fundamental algorithm used for routing, logistics, and network analysis, but the cubic time complexity and heavy data movement of the canonical Floyd-Warshall (FW) algorithm severely limits its scalability on conventional CPUs or GPUs. In this paper, we propose PIM-FW, a novel co-designed hardware architecture and dataflow that leverages processing in and near memory architecture designed to accelerate blocked FW algorithm on an HBM3 stack. To enable fine-grained parallelism, we propose a massively parallel array of specialized bit-serial bank PE and channel PE designed to accelerate the core min-plus operations. Our novel dataflow complements this hardware, employing an interleaved mapping policy for superior load balancing and hybrid in and near memory computing model for efficient computation and reduction. The novel in-bank computing approach allows all distance updates to be performed and stored in memory bank, a key contribution is that eliminates the data movement bottleneck inherent in GPU-based approaches. We implement a full software and hardware co-design using a cycle-accurate simulator to simulate an 8-channel, 4-Hi HBM3 PIM stack on real road-network traces. Experimental results show that, for a 8192 x 8192 graph, PIM-FW achieves a 18.7x speedup in end-to-end execution, and consumes 3200x less DRAM energy compared to a state-of-the-art GPU-only Floyd-Warshall.
Similar Papers
PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization
Hardware Architecture
Speeds up computer "brains" by moving work closer to memory.
No One-Size-Fits-All: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-Memory
Hardware Architecture
Chooses best computer memory for each task.
JSPIM: A Skew-Aware PIM Accelerator for High-Performance Databases Join and Select Operations
Hardware Architecture
Speeds up searching data in computers 400x.