Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Bringing HPE Slingshot 11 support to Open MPI

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.8203· OSTI ID:2438730
The Cray HPE Slingshot 11 network is used on the new exascale systems arriving at the U.S. Department of Energy (DoE) laboratories (e.g., Frontier, Aurora, Perlmutter). As such, the support of this network is an important capability to meet the needs of exascale applications. Here, this article highlights recent work to develop supporting infrastructure to enable Open MPI to efficiently support these new platforms. A key component of this effort involves development of a new Open Fabrics Interface (OFI) provider, LinkX. We discuss the design and development of enhancements that take advantage of the new Slingshot 11 network and AMD GPUs. We include performance data from tests on the Frontier supercomputer using synthetic communication benchmarks, and the vendor provided MPI as a baseline for comparison. The tests demonstrate full functionality of Open MPI on the system and initial results show favorable performance when compared to the highly tuned vendor implementation.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)
Grant/Contract Number:
89233218CNA000001; AC05-00OR22725
OSTI ID:
2438730
Alternate ID(s):
OSTI ID: 2404434
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 22 Vol. 36; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (5)

Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation book January 2004
Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem conference January 2006
UCX: An Open Source Framework for HPC Network APIs and Beyond conference August 2015
A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency conference August 2015
PMIx conference September 2017

Similar Records

HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer
Conference · Wed Jun 01 00:00:00 EDT 2022 · OSTI ID:1883870

Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
Journal Article · Wed Apr 10 20:00:00 EDT 2024 · Concurrency and Computation. Practice and Experience · OSTI ID:2336800