Preparing MPICH for exascale

Guo, Yanfei; Raffenetti, Ken; Zhou, Hui; Balaji, Pavan; Si, Min; Amer, Abdelhalim; Iwasaki, Shintaro; Seo, Sangmin; Congiu, Giuseppe; Latham, Robert; Oden, Lena; Gillis, Thomas; Zambre, Rohit; Ouyang, Kaiming; Archer, Charles; Bland, Wesley; Jose, Jithin; Sur, Sayantan; Fujita, Hajime; Durnov, Dmitry; Chuvelev, Michael; Zheng, Gengbin; Brooks, Alex; Thapaliya, Sagar; Doodi, Taru; Garazan, Maria; Oyanagi, Steve; Snir, Marc; Thakur, Rajeev

doi:10.1177/10943420241311608

Preparing MPICH for exascale

Journal Article · Thu Jan 09 00:00:00 EST 2025 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/10943420241311608· OSTI ID:2506860

^[1]; ^[1]; ^[1]; Balaji, Pavan ^[2]; Si, Min ^[2]; Amer, Abdelhalim ^[3]; Iwasaki, Shintaro ^[2]; Seo, Sangmin ^[4]; ^[5]; ^[1]; Oden, Lena ^[6]; ^[5]; Zambre, Rohit ^[7]; Ouyang, Kaiming ^[8]; Archer, Charles ^[9]; Bland, Wesley ^[10]; Jose, Jithin ^[11]; Sur, Sayantan ^[12]; Fujita, Hajime ^[13]; Durnov, Dmitry ^[9] more »

Argonne National Laboratory (ANL), Argonne, IL (United States)
Argonne National Laboratory (ANL), Argonne, IL (United States); Meta, Palo Alto, CA (United States)
Argonne National Laboratory (ANL), Argonne, IL (United States); Cerebras Systems, Sunnyvale, CA (United States)
Argonne National Laboratory (ANL), Argonne, IL (United States); Klaytn Foundation (Singapore)
Argonne National Laboratory (ANL), Argonne, IL (United States); NVIDIA Corporation, Santa Clara, CA (United States)
Argonne National Laboratory (ANL), Argonne, IL (United States); FernUniversität in Hagen (Germany)
NVIDIA Corporation, Santa Clara, CA (United States); Univ. of California, Irvine, CA (United States)
NVIDIA Corporation, Santa Clara, CA (United States); Univ. of California, Riverside, CA (United States)
Intel Corporation, Santa Clara, CA (United States)
Meta, Palo Alto, CA (United States); Intel Corporation, Santa Clara, CA (United States)
Intel Corporation, Santa Clara, CA (United States); Microsoft Corporation, Redmond, WA (United States)
NVIDIA Corporation, Santa Clara, CA (United States); Intel Corporation, Santa Clara, CA (United States)
Intel Corporation, Santa Clara, CA (United States); Fastly, San Francisco, CA (United States)
Hewlett Packard Enterprise, Palo Alto, CA (United States)
Univ. of Illinois at Urbana-Champaign, IL (United States)

The advent of exascale supercomputers heralds a new era of scientific discovery, yet it introduces significant architectural challenges that must be overcome for MPI applications to fully exploit its potential. Among these challenges is the adoption of heterogeneous architectures, particularly the integration of GPUs to accelerate computation. Additionally, the complexity of multithreaded programming models has also become a critical factor in achieving performance at scale. The efficient utilization of hardware acceleration for communication, provided by modern NICs, is also essential for achieving low latency and high throughput communication in such complex systems. In response to these challenges, the MPICH library, a high-performance and widely used Message Passing Interface (MPI) implementation, has undergone significant enhancements. Here, this paper presents four major contributions that prepare MPICH for the exascale transition. First, we describe a lightweight communication stack that leverages the advanced features of modern NICs to maximize hardware acceleration. Second, our work showcases a highly scalable multithreaded communication model that addresses the complexities of concurrent environments. Third, we introduce GPU-aware communication capabilities that optimize data movement in GPU-integrated systems. Finally, we present a new datatype engine aimed at accelerating the use of MPI derived datatypes on GPUs. These improvements in the MPICH library not only address the immediate needs of exascale computing architectures but also set a foundation for exploiting future innovations in high-performance computing. By embracing these new designs and approaches, MPICH-derived libraries from HPE Cray and Intel were able to achieve real exascale performance on OLCF Frontier and ALCF Aurora respectively.

View Journal Article

Research Organization:: Argonne National Laboratory (ANL), Argonne, IL (United States)

Sponsoring Organization:: USDOE; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)

Grant/Contract Number:: AC02-06CH11357; AC05-00OR22725

OSTI ID:: 2506860

Alternate ID(s):: OSTI ID: 3005770

Journal Information:: International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 2 Vol. 39; ISSN 1094-3420; ISSN 1741-2846

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

References (20)

Assessing a mini‐application as a performance proxy for a finite element method engineering application Lin, Paul T.; Heroux, Michael A.; Barrett, Richard F. Concurrency and Computation: Practice and Experience, Vol. 27, Issue 17 https://doi.org/10.1002/cpe.3587	journal	July 2015
A survey of MPI usage in the US exascale computing project: A survey of MPI usage in the U. S. exascale computing project Bernholdt, David E.; Boehm, Swen; Bosilca, George Concurrency and Computation: Practice and Experience https://doi.org/10.1002/cpe.4851	journal	September 2018
Implementing Fast and Reusable Datatype Processing Ross, Robert; Miller, Neill; Gropp, William D. Recent Advances in Parallel Virtual Machine and Message Passing Interface https://doi.org/10.1007/978-3-540-39924-7_55	book	January 2003
PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems Balaji, Pavan; Buntinas, Darius; Goodell, David Recent Advances in the Message Passing Interface https://doi.org/10.1007/978-3-642-15646-5_4	book	January 2010
The Locally Self-consistent Multiple Scattering code in a geographically distributed linked MPP environment J. Sheehan, Timothy; Shelton, William A.; Pratt, Thomas J. Parallel Computing, Vol. 24, Issue 12-13 https://doi.org/10.1016/S0167-8191(98)00080-5	journal	November 1998
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations Valiev, M.; Bylaska, E. J.; Govind, N. Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489 https://doi.org/10.1016/j.cpc.2010.04.018	journal	September 2010
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures Hashmi, Jahanzeb Maqbool; Chu, Ching-Hsiang; Chakraborty, Sourav Journal of Parallel and Distributed Computing, Vol. 144 https://doi.org/10.1016/j.jpdc.2020.05.008	journal	October 2020
`QMCPACK` : an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids Kim, Jeongnim; Baczewski, Andrew D.; Beaudet, Todd D. Journal of Physics: Condensed Matter, Vol. 30, Issue 19 https://doi.org/10.1088/1361-648X/aab9c3	journal	April 2018
CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters Chu, Ching-Hsiang; Hamidouche, Khaled; Venkatesh, Akshay 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) https://doi.org/10.1109/CCGrid.2016.111	conference	May 2016
Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries Suresh, Kaushik Kandadi; Khorassani, Kawthar Shafie; Chen, Chen Chun 2022 IEEE Symposium on High-Performance Interconnects (HOTI) https://doi.org/10.1109/HOTI55740.2022.00018	conference	August 2022
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs Potluri, Sreeram; Hamidouche, Khaled; Venkatesh, Akshay 2013 42nd International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2013.17	conference	October 2013
Memory Compression Techniques for Network Address Management in MPI Guo, Yanfei; Archer, Charles J.; Blocksome, Michael 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2017.18	conference	May 2017
Legion: Expressing locality and independence with logical regions Bauer, Michael; Treichler, Sean; Slaughter, Elliott 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71	conference	November 2012
HACC: extreme scaling and performance across diverse architectures Habib, Salman; Morozov, Vitali; Frontiere, Nicholas Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2504566	conference	January 2013
Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1 Raffenetti, Ken; Blocksome, Michael; Si, Min Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126963	conference	January 2017
How I learned to stop worrying about user-visible endpoints and love MPI Zambre, Rohit; Chandramowliswharan, Aparna; Balaji, Pavan Proceedings of the 34th ACM International Conference on Supercomputing https://doi.org/10.1145/3392717.3392773	conference	June 2020
MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming Zhou, Hui; Raffenetti, Ken; Guo, Yanfei Proceedings of the 29th European MPI Users' Group Meeting https://doi.org/10.1145/3555819.3555820	conference	September 2022
Enabling communication concurrency through flexible MPI endpoints Dinan, James; Grant, Ryan E.; Balaji, Pavan The International Journal of High Performance Computing Applications, Vol. 28, Issue 4 https://doi.org/10.1177/1094342014548772	journal	September 2014
Taking the MPI standard and the open MPI library to exascale Bernholdt, David E.; Bosilca, George; Bouteiller, Aurelien The International Journal of High Performance Computing Applications https://doi.org/10.1177/10943420241265936	journal	July 2024
LULESH 2.0 Updates and Changes Karlin, I.; Keasler, J.; Neely, J. R. https://doi.org/10.2172/1090032	report	July 2013

Similar Records

Designing and prototyping extensions to the Message Passing Interface in MPICH

Journal Article · Sun Aug 18 20:00:00 EDT 2024 · International Journal of High Performance Computing Applications · OSTI ID:2571429

MPICH-G2 : a grid-enabled implementation of the message passing interface.

Journal Article · Thu May 01 00:00:00 EDT 2003 · J. Parallel Distrib. Comput. · OSTI ID:949654

Preparing MPICH for exascale

Citation Formats

References (20)

Similar Records

Related Subjects