Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

Conference · · Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular communication as point-to-point, and any optimizations are integrated directly into the application. As a result, these optimizations lack portability. It is difficult to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for implementing existing optimizations for irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows up to a 1.38x speedup on sparse matrix-vector multiplication communication within a BoomerAMG solve through the use of our optimized neighbor collectives. Here, the authors analyze three implementations of persistent neighborhood collectives for Alltoallv: an unoptimized wrapper of standard point-to-point communication, and two locality-aware aggregating methods. The second locality-aware implementation exposes an non-standard interface to perform additional optimization, and the authors present the additional 0.07x speedup from the extended interface. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install.

Research Organization:
Univ. of New Mexico, Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
DOE Contract Number:
NA0003966; CCF-2151022
OSTI ID:
2205620
Journal Information:
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Conference: SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO (United States), 12-17 Nov 2023
Country of Publication:
United States
Language:
English

References (31)

Why is MPI (perceived to be) so complex? conference September 2020
Node-Aware Improvements to Allreduce conference November 2019
Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods journal March 2023
Detailed Modeling of Heterogeneous and Contention-Constrained Point-to-Point MPI Communication journal May 2023
A Locality-Aware Bruck Allgather conference September 2022
Communication optimization strategies for distributed deep neural network training: A survey journal March 2021
Node aware sparse matrix–vector multiplication journal August 2019
Designing multi-leader-based Allgather algorithms for multi-core clusters conference May 2009
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes conference November 2020
Improving Performance Models for Irregular Point-to-Point Communication
  • Bienz, Amanda; Gropp, William D.; Olson, Luke N.
  • EuroMPI'18: 25th European MPI Users' Group Meeting, Proceedings of the 25th European MPI Users' Group Meeting https://doi.org/10.1145/3236367.3236368
conference September 2018
Looking under the hood of the IBM Blue Gene/Q network
  • Chen, Dong; Eisley, Noel; Heidelberger, Philip
  • 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.72
conference November 2012
BoomerAMG: A parallel algebraic multigrid solver and preconditioner journal April 2002
Efficient Allgather for Regular SMP-Clusters book January 2006
The Exascale Era is Upon Us: The Frontier supercomputer may be the first to reach 1,000,000,000,000,000,000 operations per second journal January 2022
Reducing communication in algebraic multigrid with multi-step node aware communication journal June 2020
Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test conference January 2016
A Hierarchical and Load-Aware Design for Large Message Neighborhood Collectives conference November 2020
Implementation and evaluation of MPI 4.0 partitioned communication libraries journal December 2021
Overview of the Blue Gene/L system architecture journal March 2005
Optimization of MPI persistent communication conference September 2013
HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters conference May 2012
Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures journal July 2023
Stepping up to Summit journal March 2018
Designing broadcasting algorithms in the postal model for message-passing systems conference January 1992
Modeling Data Movement Performance on Heterogeneous Architectures conference September 2021
Cheetah: A Framework for Scalable Hierarchical Collective Operations conference May 2011
Exploiting hierarchy in parallel computer networks to optimize collective operation performance conference January 2000
Topology-Aware Rank Reordering for MPI Collectives conference May 2016
Bandwidth Efficient All-reduce Operation on Tree Topologies conference March 2007
PETSc (Portable, Extensible Toolkit for Scientific Computation) book January 2011
Decomposing MPI Collectives for Exploiting Multi-lane Communication conference September 2020