Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5150· OSTI ID:1501630
 [1];  [1];  [1];  [1];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Here, this paper explores key differences of MPI match lists for several important United States Department of Energy (DOE) applications and proxy applications. This understanding is critical in determining the most promising hardware matching design for any given high-speed network. The results of MPI match list studies for the major open-source MPI implementations, MPICH and Open MPI, are presented, and we modify an MPI simulator, LogGOPSim, to provide match list statistics. These results are discussed in the context of several different potential design approaches to MPI matching–capable hardware. The data illustrate the requirements for different hardware designs in terms of performance and memory capacity. Finally, this paper's contributions are the collection and analysis of data to help inform hardware designers of common MPI requirements and highlight the difficulties in determining these requirements by only examining a single MPI implementation.

Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000; NA0003525; AC02-05CH11231
OSTI ID:
1501630
Alternate ID(s):
OSTI ID: 1511803
Report Number(s):
SAND--2019-0943J; 671923
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 3 Vol. 32; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (31)

Characterizing MPI matching via trace-based simulation journal September 2018
The Quadrics network: high-performance clustering technology journal January 2002
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications conference November 2014
Eliminating contention bottlenecks in multithreaded MPI journal November 2017
Performance of particle in cell methods on highly concurrent computational architectures journal July 2007
Characterizing MPI matching via trace-based simulation conference January 2017
How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms conference January 2016
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance journal May 2006
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
  • Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.12
conference November 2010
Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1
  • Raffenetti, Ken; Blocksome, Michael; Si, Min
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126963
conference January 2017
Instrumentation and Analysis of MPI Queue Times on the SeaStar High-Performance Network
  • Brightwell, R.; Pedretti, K.; Ferreira, K.
  • 17th International Conference on Computer Communications and Networks 2008, 2008 Proceedings of 17th International Conference on Computer Communications and Networks https://doi.org/10.1109/ICCCN.2008.ECP.116
conference August 2008
Re-evaluating Network Onload vs. Offload for the Many-Core Era conference September 2015
Toward an evolutionary task parallel integrated MPI + X programming model
  • Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T.
  • Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15 https://doi.org/10.1145/2712386.2712388
conference January 2015
An architecture to perform NIC based MPI matching conference September 2007
Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters conference October 2013
Myrinet: a gigabit-per-second local area network journal January 1995
LogGOPSim: simulating large-scale applications in the LogGOPS model
  • Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851564
conference January 2010
Fast Parallel Algorithms for Short-Range Molecular Dynamics journal March 1995
The impact of MPI queue usage on message latency conference January 2004
The BXI Interconnect Architecture conference August 2015
The Case for Semi-Permanent Cache Occupancy: Understanding the Impact of Data Locality on Network Processing conference January 2018
A high-performance, portable implementation of the MPI message passing interface standard journal September 1996
A Dedicated Message Matching Mechanism for Collective Communications conference January 2018
Preparing for exascale: modeling MPI for many-core systems using fine-grain queues conference January 2015
An evaluation of MPI message rate on hybrid-core processors journal November 2014
Enabling communication concurrency through flexible MPI endpoints journal September 2014
Measuring Multithreaded Message Matching Misery book January 2018
A fast and resource-conscious MPI message queue mechanism for large-scale jobs journal January 2014
sPIN: High-performance streaming Processing In the Network
  • Hoefler, Torsten; Di Girolamo, Salvatore; Taranov, Konstantin
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126970
conference November 2017
Understanding Performance Interference in Next-Generation HPC Systems
  • Mondragon, Oscar H.; Bridges, Patrick G.; Levy, Scott
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.32
conference November 2016
Improving MPI Multi-threaded RMA Communication Performance conference January 2018

Cited By (2)

Foreword to the Special Issue of the Workshop on Exascale MPI (ExaMPI 2017)
  • Skjellum, Anthony; Bangalore, Purushotham V.; Grant, Ryan E.
  • Concurrency and Computation: Practice and Experience, Vol. 32, Issue 3 https://doi.org/10.1002/cpe.5459
journal July 2019
Performance drop at executing communication-intensive parallel algorithms journal January 2020