Fit Fly: A Case Study of Interconnect Innovation through Parallel Simulation
Abstract
To meet the demand for exascale-level performance from high-performance computing (HPC) interconnects, many system architects are turning to simulation results for accurate and reliable predictions of the performance of prospective technologies. Testing full-scale networks with a variety of benchmarking tools, including synthetic workloads and application traces, can give crucial insight into what ideas are most promising without needing to physically construct a test network. While flexible, however, this approach is extremely compute time intensive. We address this time complexity challenge through the use of large-scale, optimistic parallel simulation that ultimately leads to faster HPC network architecture innovations. In this paper we demonstrate this innovation capability through a real-world network design case study. Specifically, we have simulated and compared four extreme-scale interconnects: Dragonfly, Megafly, Slim Fly, and a new dual-rail-dual-plane variation of the Slim Fly network topology. We present this new variant of Slim Fly, dubbed Fit Fly, to show how interconnect innovation and evaluation-beyond what is possible through analytic methods-can be achieved through parallel simulation. We validate and compare the model with various network designs using the CODES interconnect simulation framework. By running large-scale simulations in a parallel environment, we are able to quickly generate reliable performance results that canmore »
- Authors:
- Publication Date:
- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science - Office of Advanced Scientific Computing Research
- OSTI Identifier:
- 1574984
- DOE Contract Number:
- AC02-06CH11357
- Resource Type:
- Conference
- Resource Relation:
- Conference: 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 06/03/19 - 06/05/19, Chicago, IL, US
- Country of Publication:
- United States
- Language:
- English
- Subject:
- High Performance Computing; Interconnection Networks; Modeling; Parallel Discrete Event Simulation
Citation Formats
McGlohon, Neil, Wolfe, Noah, Mubarak, Misbah, and Carothers, Christopher. Fit Fly: A Case Study of Interconnect Innovation through Parallel Simulation. United States: N. p., 2019.
Web. doi:10.1145/3316480.3325515.
McGlohon, Neil, Wolfe, Noah, Mubarak, Misbah, & Carothers, Christopher. Fit Fly: A Case Study of Interconnect Innovation through Parallel Simulation. United States. doi:10.1145/3316480.3325515.
McGlohon, Neil, Wolfe, Noah, Mubarak, Misbah, and Carothers, Christopher. Tue .
"Fit Fly: A Case Study of Interconnect Innovation through Parallel Simulation". United States. doi:10.1145/3316480.3325515.
@article{osti_1574984,
title = {Fit Fly: A Case Study of Interconnect Innovation through Parallel Simulation},
author = {McGlohon, Neil and Wolfe, Noah and Mubarak, Misbah and Carothers, Christopher},
abstractNote = {To meet the demand for exascale-level performance from high-performance computing (HPC) interconnects, many system architects are turning to simulation results for accurate and reliable predictions of the performance of prospective technologies. Testing full-scale networks with a variety of benchmarking tools, including synthetic workloads and application traces, can give crucial insight into what ideas are most promising without needing to physically construct a test network. While flexible, however, this approach is extremely compute time intensive. We address this time complexity challenge through the use of large-scale, optimistic parallel simulation that ultimately leads to faster HPC network architecture innovations. In this paper we demonstrate this innovation capability through a real-world network design case study. Specifically, we have simulated and compared four extreme-scale interconnects: Dragonfly, Megafly, Slim Fly, and a new dual-rail-dual-plane variation of the Slim Fly network topology. We present this new variant of Slim Fly, dubbed Fit Fly, to show how interconnect innovation and evaluation-beyond what is possible through analytic methods-can be achieved through parallel simulation. We validate and compare the model with various network designs using the CODES interconnect simulation framework. By running large-scale simulations in a parallel environment, we are able to quickly generate reliable performance results that can help network designers break ground on the next generation of high-performance network designs.},
doi = {10.1145/3316480.3325515},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {1}
}