Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism
- University of New Mexico, Albuquerque, NM (United States)
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular communication as point-to-point, and any optimizations are integrated directly into the application. As a result, these optimizations lack portability. It is difficult to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for implementing existing optimizations for irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows up to a 1.38x speedup on sparse matrix-vector multiplication communication within a BoomerAMG solve through the use of our optimized neighbor collectives. Here, the authors analyze three implementations of persistent neighborhood collectives for Alltoallv: an unoptimized wrapper of standard point-to-point communication, and two locality-aware aggregating methods. The second locality-aware implementation exposes an non-standard interface to perform additional optimization, and the authors present the additional 0.07x speedup from the extended interface. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install.
- Research Organization:
- Univ. of New Mexico, Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
- DOE Contract Number:
- NA0003966; CCF-2151022
- OSTI ID:
- 2205620
- Journal Information:
- Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Conference: SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO (United States), 12-17 Nov 2023
- Country of Publication:
- United States
- Language:
- English
Similar Records
Deploy threading in Nalu solver stack
Quantum Monte Carlo Endstation for Petascale Computing