Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive
By: Gabin Schieffer , Jacob Wahlgren , Ruimin Shi and more
Potential Business Impact:
Makes supercomputers share data faster between parts.
The ever-increasing compute performance of GPU accelerators drives up the need for efficient data movements within HPC applications to sustain performance. Proposed as a solution to alleviate CPU-GPU data movement, AMD MI300A Accelerated Processing Unit (APU) combines CPU, GPU, and high-bandwidth memory (HBM) within a single physical package. Leadership supercomputers, such as El Capitan, group four APUs within a single compute node, using Infinity Fabric Interconnect. In this work, we design specific benchmarks to evaluate direct memory access from the GPU, explicit inter-APU data movement, and collective multi-APU communication. We also compare the efficiency of HIP APIs, MPI routines, and the GPU-specialized RCCL library. Our results highlight key design choices for optimizing inter-APU communication on multi-APU AMD MI300A systems with Infinity Fabric, including programming interfaces, allocators, and data movement. Finally, we optimize two real HPC applications, Quicksilver and CloverLeaf, and evaluate them on a four MI100A APU system.
Similar Papers
Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive
Distributed, Parallel, and Cluster Computing
Makes supercomputers share data faster between parts.
Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
Distributed, Parallel, and Cluster Computing
Lets computers share memory, saving money and time.
AMD MI300X GPU Performance Analysis
Performance
Makes AI models run much faster on new chips.