Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation
- Rensselaer Polytechnic Inst., Troy, NY (United States)
- Argonne National Lab. (ANL), Lemont, IL (United States)
As supercomputers approach exascale performance, the increased number of processors translates to an increased demand on the underlying network interconnect. We present that the slim fly network topology, a new low-diameter, low-latency, and low-cost interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this article, we present a high-fidelity slim fly packet-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate the model with published work before scaling the network size up to an unprecedented 1 million compute nodes and confirming that the slim fly observes peak network throughput at extreme scale. In addition to synthetic workloads, we evaluate large-scale slim fly models with real communication workloads from applications in the Design Forward program with over 110,000 MPI processes. We show strong scaling of the slim fly model on an Intel cluster achieving a peak network packet transfer rate of 2.3 million packets per second and processing over 7 billion discrete events using 128 MPI tasks. Enabled by the strong performance capabilities of the model, we perform a detailed application trace and routing protocol performance study. Lastly, through analysis of metrics such as packet latency, hop count, and congestion, we find that the slim fly network is able to leverage simple minimal routing and achieve the same performance as more complex adaptive routing for tested DOE benchmark applications.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); Air Force Research Laboratory (AFRL)
- Grant/Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1488539
- Journal Information:
- ACM Transactions on Modeling and Computer Simulation, Vol. 28, Issue 4; ISSN 1049-3301
- Publisher:
- Association for Computing MachineryCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
A case study in using massively parallel simulation for extreme-scale torus network codesign
|
conference | January 2014 |
Load-Balancing in Multistage Interconnection Networks under Multiple-Pass Routing
|
journal | August 1996 |
(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks
|
conference | September 2016 |
Virtual-channel flow control
|
journal | March 1992 |
Speeding up Nek5000 with autotuning and specialization
|
conference | January 2010 |
Efficient optimistic parallel simulations using reverse computation
|
journal | July 1999 |
Warp speed: executing time warp on 1,966,080 cores
|
conference | January 2013 |
Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
|
conference | November 2012 |
Geometric realisation of the graphs of McKay–Miller–Širáň
|
journal | March 2004 |
A Scheme for Fast Parallel Communication
|
journal | May 1982 |
Trace-driven Co-simulation of High-Performance Computing Systems using OMNeT++
|
conference | January 2009 |
ROSS: A high-performance, low-memory, modular Time Warp system
|
journal | November 2002 |
Cost-effective diameter-two topologies: analysis and evaluation
|
conference | January 2015 |
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
|
conference | January 1995 |
The cost of conservative synchronization in parallel discrete event simulations
|
journal | April 1993 |
The structural simulation toolkit
|
journal | March 2011 |
A Note on Large Graphs of Diameter Two and Given Maximum Degree
|
journal | September 1998 |
Enabling Parallel Simulation of Large-Scale HPC Network Systems
|
journal | January 2017 |
Techniques for modeling large-scale HPC I/O workloads
|
conference | January 2015 |
FatTreeSim: Modeling Large-scale Fat-Tree Networks for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation
|
conference | January 2015 |
Technology-Driven, Highly-Scalable Dragonfly Topology
|
journal | June 2008 |
Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
|
conference | January 2016 |
Similar Records
Enabling parallel simulation of large-scale HPC network systems
Fit Fly: A Case Study of Interconnect Innovation through Parallel Simulation