Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Hybrid PDES Simulation of HPC Networks Using Zombie Packets

Journal Article · · ACM Transactions on Modeling and Computer Simulation
DOI:https://doi.org/10.1145/3682060· OSTI ID:3017061
Although high-fidelity network simulations have proven to be reliable and cost-effective tools to peer into architectural questions for high-performance computing (HPC) networks, they incur a high resource cost. The time spent in simulating a single millisecond of network traffic in the highest detail can take hours, even for static, well-behaved traffic patterns such as uniform random. Surrogate models offer a significant reduction in runtime, yet they cannot serve as complete replacements and should only be used when appropriate. Thus, there is a need for hybrid modeling, where high-fidelity simulation and surrogates run side-by-side. Here, we present a surrogate model for HPC networks in which: packets bypass the network, while the network state is left untouched, i.e., suspended. To bypass the network, we use historical data to estimate the arrival time at which every packet should be scheduled at; to suspend the network, all in-flight packets are scheduled to arrive at their destinations, and are kept in the system to awaken as zombies when switching back to high-fidelity. Speedup for a hybrid model is relative to the proportion of surrogate to high-fidelity. This light-weight surrogate obtained up to 76× speedup. Keeping the zombies in the network showed an increase in the accuracy of the high-fidelity simulation on restart when compared to restarting the network from an empty state.
Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
3017061
Journal Information:
ACM Transactions on Modeling and Computer Simulation, Journal Name: ACM Transactions on Modeling and Computer Simulation Journal Issue: 2 Vol. 35; ISSN 1049-3301; ISSN 1558-1195
Publisher:
Association for Computing MachineryCopyright Statement
Country of Publication:
United States
Language:
English

References (17)

Multiscale modeling: recent progress and open questions journal January 2018
Preliminary Performance Analysis of Multi-rail Fat-Tree Networks conference May 2017
Technology-Driven, Highly-Scalable Dragonfly Topology
  • Kim, John; Dally, Wiliam J.; Scott, Steve
  • 2008 35th International Symposium on Computer Architecture (ISCA), 2008 International Symposium on Computer Architecture https://doi.org/10.1109/ISCA.2008.19
conference June 2008
Parallel Simulation of Hybrid Network Traffic Models conference June 2007
Exploration of Congestion Control Techniques on Dragonfly-class HPC Networks Through Simulation
  • McGlohon, Neil; Carothers, Christopher D.; Hemmert, K. Scott
  • 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) https://doi.org/10.1109/PMBS54543.2021.00010
conference November 2021
Study of Workload Interference with Intelligent Routing on Dragonfly conference November 2022
Enabling Parallel Simulation of Large-Scale HPC Network Systems journal January 2017
A performance study of the cancelback protocol for Time Warp conference July 1993
Warp speed: executing time warp on 1,966,080 cores
  • Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
  • Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13 https://doi.org/10.1145/2486092.2486134
conference January 2013
Elastic time journal April 1998
Modeling and Analysis of Application Interference on Dragonfly+ conference May 2019
Directed Statistical Warming through Time Traveling conference October 2019
Machine Learning for Interconnect Network Traffic Forecasting: Investigation and Exploitation conference June 2023
Multifidelity Memory System Simulation in SST conference October 2023
Accelerating ATM Simulations Using Dynamic Component Substitution (DCS) journal April 2006
A rate-based TCP traffic model to accelerate network simulation journal February 2013
Review of multi-fidelity models journal January 2023

Similar Records

A high throughput packet-switching network with neural network controlled bypass queueing and multiplexing
Conference · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:98871

An architecture for high-speed packet-switched networks
Thesis/Dissertation · Sat Dec 31 23:00:00 EST 1988 · OSTI ID:5827197

An investigation of packet reordering in TCP traces (extended abstract)
Conference · Wed Dec 31 23:00:00 EST 2003 · OSTI ID:977651