skip to main content

DOE PAGESDOE PAGES

This content will become publicly available on January 31, 2019

Title: Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a large parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.
Authors:
 [1] ;  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Grant/Contract Number:
AC05-00OR22725
Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Modeling and Computer Simulation
Additional Journal Information:
Journal Volume: 28; Journal Issue: 1; Journal ID: ISSN 1049-3301
Publisher:
Association for Computing Machinery
Research Org:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; algorithms; design; experimentation; performance; graphical processing units; CUDA; load balancing; time synchronization; what-if decision tree; supercomputing
OSTI Identifier:
1424471

Yoginath, Srikanth B., and Perumalla, Kalyan S.. Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids. United States: N. p., Web. doi:10.1145/3158669.
Yoginath, Srikanth B., & Perumalla, Kalyan S.. Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids. United States. doi:10.1145/3158669.
Yoginath, Srikanth B., and Perumalla, Kalyan S.. 2018. "Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids". United States. doi:10.1145/3158669.
@article{osti_1424471,
title = {Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids},
author = {Yoginath, Srikanth B. and Perumalla, Kalyan S.},
abstractNote = {Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a large parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.},
doi = {10.1145/3158669},
journal = {ACM Transactions on Modeling and Computer Simulation},
number = 1,
volume = 28,
place = {United States},
year = {2018},
month = {1}
}