Examining MPI and its Extensions for Asynchronous Multithreaded Communication
By: Jiakun Yan, Marc Snir, Yanfei Guo
Potential Business Impact:
Makes supercomputers talk faster for science.
The increasing complexity of HPC architectures and the growing adoption of irregular scientific algorithms demand efficient support for asynchronous, multithreaded communication. This need is especially pronounced with Asynchronous Many-Task (AMT) systems. This communication pattern was not a consideration during the design of the original MPI specification. The MPI community has recently introduced several extensions to address these evolving requirements. This work evaluates two such extensions, the Virtual Communication Interface (VCI) and the Continuation extensions, in the context of an established AMT runtime HPX. We begin by using an MPI-level microbenchmark, modeled from HPX's low-level communication mechanism, to measure the peak performance potential of these extensions. We then integrate them into HPX to evaluate their effectiveness in real-world scenarios. Our results show that while these extensions can enhance performance compared to standard MPI, areas for improvement remain. The current continuation proposal limits the maximum multithreaded message rate achievable in the multi-VCI setting. Furthermore, the recommended one-VCI-per-thread mode proves ineffective in real-world systems due to the attentiveness problem. These findings underscore the importance of improving intra-VCI threading efficiency to achieve scalable multithreaded communication and fully realize the benefits of recent MPI extensions.
Similar Papers
Understanding the Communication Needs of Asynchronous Many-Task Systems -- A Case Study of HPX+LCI
Distributed, Parallel, and Cluster Computing
Makes supercomputers run science faster.
Contemplating a Lightweight Communication Interface for Asynchronous Many-Task Systems
Distributed, Parallel, and Cluster Computing
Makes computer programs talk to each other faster.
MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems
Distributed, Parallel, and Cluster Computing
Makes supercomputers share info faster, no copying.