Bringing HPE Slingshot 11 support to Open MPI
Journal Article
·
· Concurrency and Computation. Practice and Experience
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
The Cray HPE Slingshot 11 network is used on the new exascale systems arriving at the U.S. Department of Energy (DoE) laboratories (e.g., Frontier, Aurora, Perlmutter). As such, the support of this network is an important capability to meet the needs of exascale applications. Here, this article highlights recent work to develop supporting infrastructure to enable Open MPI to efficiently support these new platforms. A key component of this effort involves development of a new Open Fabrics Interface (OFI) provider, LinkX. We discuss the design and development of enhancements that take advantage of the new Slingshot 11 network and AMD GPUs. We include performance data from tests on the Frontier supercomputer using synthetic communication benchmarks, and the vendor provided MPI as a baseline for comparison. The tests demonstrate full functionality of Open MPI on the system and initial results show favorable performance when compared to the highly tuned vendor implementation.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)
- Grant/Contract Number:
- 89233218CNA000001; AC05-00OR22725
- OSTI ID:
- 2438730
- Alternate ID(s):
- OSTI ID: 2404434
- Journal Information:
- Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 22 Vol. 36; ISSN 1532-0626
- Publisher:
- WileyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer
Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
Conference
·
Wed Jun 01 00:00:00 EDT 2022
·
OSTI ID:1883870
Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
Journal Article
·
Wed Apr 10 20:00:00 EDT 2024
· Concurrency and Computation. Practice and Experience
·
OSTI ID:2336800