Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

An evaluation of the CORAL interconnects

Conference ·

The US Department of Energy deployed the Summit and Sierra supercomputers with the latest state-of-the-art network interconnect technology in 2018 and both systems entered production in 2019. In this paper, we provide an in-depth assessment of the systems' network interconnects that are based on Enhanced Data Rate (EDR) 100 Gb/s Mellanox InfiniBand. Both systems use second-generation EDR Host Channel Adapters (HCAs) and switches with several new features such as Adaptive Routing (AR), switch-based collectives, and HCA-based tag matching. Although based on the same components, Summit's network is "non-blocking" (i.e., a fully provisioned Clos network) and Sierra's network has a 2:1 taper between the racks and aggregation switches. We evaluate the two systems' interconnects using traditional communication benchmarks as well as production applications. We find that the new Adaptive Routing dramatically improves performance but the other new features still need improvement.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1761744
Country of Publication:
United States
Language:
English

References (18)

The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems
  • Vazhkudai, Sudharshan S.; de Supinski, Bronis R.; Bland, Arthur S.
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00055
conference November 2018
Large-eddy and unsteady RANS simulations of a shock-accelerated heavy gas cylinder journal April 2015
Petascale algorithms for reactor hydrodynamics journal July 2008
A Fast Scalable Implicit Solver for Nonlinear Time-Evolution Earthquake City Problem on Low-Ordered Unstructured Finite Elements with Artificial Intelligence and Transprecision Computing
  • Ichimura, Tsuyoshi; Fujita, Kohei; Yamaguchi, Takuma
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00052
conference November 2018
Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction conference November 2016
Solving lattice QCD systems of equations using mixed precision solvers on GPUs journal September 2010
Improving MPI communication overlap with collaborative polling journal May 2013
Scaling lattice QCD beyond 100 GPUs
  • Babich, R.; Clark, M. A.; Joó, B.
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063478
conference January 2011
PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer
  • Kumar, Sameer; Mamidala, Amith R.; Faraj, Daniel A.
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.73
conference May 2012
Simulating the Weak Death of the Neutron in a Femtoscale Universe with Near-Exascale Computing conference November 2018
Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation conference September 2013
A Study of Non-Blocking Switching Networks journal March 1953
System Noise Revisited: Enabling Application Scalability and Reproducibility with SMT conference May 2016
Acceleration of an Asynchronous Message Driven Programming Paradigm on IBM Blue Gene/Q conference May 2013
BoomerAMG: A parallel algebraic multigrid solver and preconditioner journal April 2002
S12---The HPC Challenge (HPCC) benchmark suite conference January 2006
Efficient Asynchronous Communication Progress for MPI without Dedicated Resources conference September 2018
A study of ALE simulations of Rayleigh–Taylor instability journal March 2001

Similar Records

Power Aware Dynamic Provisioning of HPC Networks
Technical Report · Thu Oct 01 00:00:00 EDT 2015 · OSTI ID:1331496

A new class of rearrangeable interconnection networks
Thesis/Dissertation · Sat Dec 31 23:00:00 EST 1988 · OSTI ID:6223673

Topolgy Agnostic Hot-Spot Avoidance with InfiniBand
Journal Article · Sat Feb 28 23:00:00 EST 2009 · Concurrency and Computation. Practice & Experience, 21(3):301-319 · OSTI ID:985587

Related Subjects