skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

Abstract

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a large parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.

Authors:
 [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1424471
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Modeling and Computer Simulation
Additional Journal Information:
Journal Volume: 28; Journal Issue: 1; Journal ID: ISSN 1049-3301
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; algorithms; design; experimentation; performance; graphical processing units; CUDA; load balancing; time synchronization; what-if decision tree; supercomputing

Citation Formats

Yoginath, Srikanth B., and Perumalla, Kalyan S. Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids. United States: N. p., 2018. Web. doi:10.1145/3158669.
Yoginath, Srikanth B., & Perumalla, Kalyan S. Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids. United States. doi:10.1145/3158669.
Yoginath, Srikanth B., and Perumalla, Kalyan S. Wed . "Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids". United States. doi:10.1145/3158669. https://www.osti.gov/servlets/purl/1424471.
@article{osti_1424471,
title = {Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids},
author = {Yoginath, Srikanth B. and Perumalla, Kalyan S.},
abstractNote = {Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a large parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.},
doi = {10.1145/3158669},
journal = {ACM Transactions on Modeling and Computer Simulation},
number = 1,
volume = 28,
place = {United States},
year = {2018},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Performance of Point and Range Queries for In-memory Databases Using Radix Trees on GPUs
conference, December 2016

  • Alam, Maksudul; Yoginath, Srikanth B.; Perumalla, Kalyan S.
  • 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • DOI: 10.1109/HPCC-SmartCity-DSS.2016.0212

Dynamic modelling of fire spread across a fuel bed
journal, January 1999

  • Balbi, J. H.; Santoni, P. A.; Dupuy, J. L.
  • International Journal of Wildland Fire, Vol. 9, Issue 4
  • DOI: 10.1071/WF00005

Exact-Differential Large-Scale Traffic Simulation
conference, January 2015

  • Hanai, Masatoshi; Suzumura, Toyotaro; Theodoropoulos, Georgios
  • Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15
  • DOI: 10.1145/2769458.2769472

Algorithms for HLA-based distributed simulation cloning
journal, October 2005

  • Chen, Dan; Turner, Stephen J.; Cai, Wentong
  • ACM Transactions on Modeling and Computer Simulation, Vol. 15, Issue 4
  • DOI: 10.1145/1113316.1113318

Discrete Event Simulations and Parallel Processing: Statistical Properties
journal, November 1988

  • Heidelberger, Philip
  • SIAM Journal on Scientific and Statistical Computing, Vol. 9, Issue 6
  • DOI: 10.1137/0909077

High performance computational steering of physical simulations
conference, January 1997


Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids
journal, November 2000

  • Vetter, Jeffrey S.; Reed, Daniel A.
  • The International Journal of High Performance Computing Applications, Vol. 14, Issue 4
  • DOI: 10.1177/109434200001400407

Cloning Agent-based Simulation on GPU
conference, January 2015

  • Li, Xiaosong; Cai, Wentong; Turner, Stephen John
  • Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15
  • DOI: 10.1145/2769458.2769470

Cloning parallel simulations
journal, October 2001

  • Hybinette, Maria; Fujimoto, Richard M.
  • ACM Transactions on Modeling and Computer Simulation, Vol. 11, Issue 4
  • DOI: 10.1145/508366.508370

Cloning Agent-Based Simulation
journal, May 2017

  • Li, Xiaosong; Cai, Wentong; Turner, Stephen J.
  • ACM Transactions on Modeling and Computer Simulation, Vol. 27, Issue 2
  • DOI: 10.1145/3013529