Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism
Conference
·
· Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
- University of New Mexico, Albuquerque, NM (United States); University of New Mexico
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- University of New Mexico, Albuquerque, NM (United States)
Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular communication as point-to-point, and any optimizations are integrated directly into the application. As a result, these optimizations lack portability. It is difficult to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for implementing existing optimizations for irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows up to a 1.38x speedup on sparse matrix-vector multiplication communication within a BoomerAMG solve through the use of our optimized neighbor collectives. Here, the authors analyze three implementations of persistent neighborhood collectives for Alltoallv: an unoptimized wrapper of standard point-to-point communication, and two locality-aware aggregating methods. The second locality-aware implementation exposes an non-standard interface to perform additional optimization, and the authors present the additional 0.07x speedup from the extended interface. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install.
- Research Organization:
- University of New Mexico, Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
- DOE Contract Number:
- NA0003966
- OSTI ID:
- 2205620
- Conference Information:
- Journal Name: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
- Country of Publication:
- United States
- Language:
- English
Similar Records
Exploring MPI Communication Models for Graph Applications Using Graph Matching as a Case Study
The MPI_T events interface: An early evaluation and overview of the interface
Conference
·
2019
·
OSTI ID:1572673
The MPI_T events interface: An early evaluation and overview of the interface
Journal Article
·
2018
· Parallel Computing
·
OSTI ID:1764760