Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

Conference · · Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
 [1];  [2];  [3]
  1. University of New Mexico, Albuquerque, NM (United States); University of New Mexico
  2. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  3. University of New Mexico, Albuquerque, NM (United States)
Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular communication as point-to-point, and any optimizations are integrated directly into the application. As a result, these optimizations lack portability. It is difficult to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for implementing existing optimizations for irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows up to a 1.38x speedup on sparse matrix-vector multiplication communication within a BoomerAMG solve through the use of our optimized neighbor collectives. Here, the authors analyze three implementations of persistent neighborhood collectives for Alltoallv: an unoptimized wrapper of standard point-to-point communication, and two locality-aware aggregating methods. The second locality-aware implementation exposes an non-standard interface to perform additional optimization, and the authors present the additional 0.07x speedup from the extended interface. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install.
Research Organization:
University of New Mexico, Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF)
DOE Contract Number:
NA0003966
OSTI ID:
2205620
Conference Information:
Journal Name: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
Country of Publication:
United States
Language:
English

References (31)

Exploiting hierarchy in parallel computer networks to optimize collective operation performance conference January 2000
Communication optimization strategies for distributed deep neural network training: A survey journal March 2021
Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures journal July 2023
Detailed Modeling of Heterogeneous and Contention-Constrained Point-to-Point MPI Communication journal May 2023
Node-Aware Improvements to Allreduce conference November 2019
Modeling Data Movement Performance on Heterogeneous Architectures conference September 2021
Optimization of MPI persistent communication conference September 2013
Bandwidth Efficient All-reduce Operation on Tree Topologies conference March 2007
A Hierarchical and Load-Aware Design for Large Message Neighborhood Collectives conference November 2020
Efficient Allgather for Regular SMP-Clusters book January 2006
Improving Performance Models for Irregular Point-to-Point Communication
  • Bienz, Amanda; Gropp, William D.; Olson, Luke N.
  • EuroMPI'18: 25th European MPI Users' Group Meeting, Proceedings of the 25th European MPI Users' Group Meeting https://doi.org/10.1145/3236367.3236368
conference September 2018
Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods journal March 2023
A Locality-Aware Bruck Allgather conference September 2022
Reducing communication in algebraic multigrid with multi-step node aware communication journal June 2020
Looking under the hood of the IBM Blue Gene/Q network
  • Chen, Dong; Eisley, Noel; Heidelberger, Philip
  • 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.72
conference November 2012
PETSc (Portable, Extensible Toolkit for Scientific Computation) book January 2011
Overview of the Blue Gene/L system architecture journal March 2005
Designing broadcasting algorithms in the postal model for message-passing systems conference January 1992
Decomposing MPI Collectives for Exploiting Multi-lane Communication conference September 2020
Stepping up to Summit journal March 2018
The Exascale Era is Upon Us: The Frontier supercomputer may be the first to reach 1,000,000,000,000,000,000 operations per second journal January 2022
Designing multi-leader-based Allgather algorithms for multi-core clusters conference May 2009
Why is MPI (perceived to be) so complex? conference September 2020
Implementation and evaluation of MPI 4.0 partitioned communication libraries journal December 2021
HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters conference May 2012
Node aware sparse matrix–vector multiplication journal August 2019
Cheetah: A Framework for Scalable Hierarchical Collective Operations conference May 2011
BoomerAMG: A parallel algebraic multigrid solver and preconditioner journal April 2002
Topology-Aware Rank Reordering for MPI Collectives conference May 2016
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes conference November 2020
Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test conference January 2016