Persistent and Partitioned MPI for Stencil Communication
By: Gerald Collom , Jason Burmark , Olga Pearce and more
Potential Business Impact:
Makes computer programs run much faster.
Many parallel applications rely on iterative stencil operations, whose performance are dominated by communication costs at large scales. Several MPI optimizations, such as persistent and partitioned communication, reduce overheads and improve communication efficiency through amortized setup costs and reduced synchronization of threaded sends. This paper presents the performance of stencil communication in the Comb benchmarking suite when using non blocking, persistent, and partitioned communication routines. The impact of each optimization is analyzed at various scales. Further, the paper presents an analysis of the impact of process count, thread count, and message size on partitioned communication routines. Measured timings show that persistent MPI communication can provide a speedup of up to 37% over the baseline MPI communication, and partitioned MPI communication can provide a speedup of up to 68%.
Similar Papers
Do MPI Derived Datatypes Actually Help? A Single-Node Cross-Implementation Study on Shared-Memory Communication
Distributed, Parallel, and Cluster Computing
Makes computer programs share data faster.
Communication-Efficient and Memory-Aware Parallel Bootstrapping using MPI
Distributed, Parallel, and Cluster Computing
Speeds up computer analysis of huge data.
Trace-based, time-resolved analysis of MPI application performance using standard metrics
Distributed, Parallel, and Cluster Computing
Finds hidden computer speed problems in programs.