skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs

Abstract

Large-scale parallel applications with complex global data dependencies beyond those of reductions pose significant scalability challenges in an asynchronous runtime system. Internodal challenges include identifying the all-to-all communication of data dependencies among the nodes. Intranodal challenges include gathering together these data dependencies into usable data objects while avoiding data duplication. This paper addresses these challenges within the context of a large-scale, industrial coal boiler simulation using the Uintah asynchronous many-task runtime system on GPU architectures. We show significant reduction in time spent analyzing data dependencies through refinements in our dependency search algorithm. Multiple task graphs are used to eliminate subsequent analysis when task graphs change in predictable and repeatable ways. Using a combined data store and task scheduler redesign reduces data dependency duplication ensuring that problems fit within host and GPU memory. Furthermore, these modifications did not require any changes to application code or sweeping changes to the Uintah runtime system. We report results running on the DOE Titan system on 119K CPU cores and 7.5K GPUs simultaneously. Our solutions can be generalized to other task dependency problems with global dependencies among thousands of nodes which must be processed efficiently at large scale.

Authors:
 [1];  [1];  [1];  [1]
  1. Univ. of Utah, Salt Lake City, UT (United States)
Publication Date:
Research Org.:
Univ. of Utah, Salt Lake City, UT (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Scientific User Facilities Division
OSTI Identifier:
1582428
Grant/Contract Number:  
NA0002375; AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Proceedings of the 3rd International IEEE Workshop on Extreme Scale Programming Models and Middleware
Additional Journal Information:
Conference: 3rd International IEEE Workshop on Extreme Scale Programming Models and Middleware
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Data dependencies; Asynchronous Many-Task; Programming Models; Runtime Systems; Scalability; GPU; Uintah; Coal Boiler; Radiative Heat Transfer

Citation Formats

Peterson, Brad, Humphrey, Alan, Schmidt, John, and Berzins, Martin. Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs. United States: N. p., 2017. Web. doi:10.1145/3152041.3152082.
Peterson, Brad, Humphrey, Alan, Schmidt, John, & Berzins, Martin. Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs. United States. doi:10.1145/3152041.3152082.
Peterson, Brad, Humphrey, Alan, Schmidt, John, and Berzins, Martin. Wed . "Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs". United States. doi:10.1145/3152041.3152082. https://www.osti.gov/servlets/purl/1582428.
@article{osti_1582428,
title = {Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs},
author = {Peterson, Brad and Humphrey, Alan and Schmidt, John and Berzins, Martin},
abstractNote = {Large-scale parallel applications with complex global data dependencies beyond those of reductions pose significant scalability challenges in an asynchronous runtime system. Internodal challenges include identifying the all-to-all communication of data dependencies among the nodes. Intranodal challenges include gathering together these data dependencies into usable data objects while avoiding data duplication. This paper addresses these challenges within the context of a large-scale, industrial coal boiler simulation using the Uintah asynchronous many-task runtime system on GPU architectures. We show significant reduction in time spent analyzing data dependencies through refinements in our dependency search algorithm. Multiple task graphs are used to eliminate subsequent analysis when task graphs change in predictable and repeatable ways. Using a combined data store and task scheduler redesign reduces data dependency duplication ensuring that problems fit within host and GPU memory. Furthermore, these modifications did not require any changes to application code or sweeping changes to the Uintah runtime system. We report results running on the DOE Titan system on 119K CPU cores and 7.5K GPUs simultaneously. Our solutions can be generalized to other task dependency problems with global dependencies among thousands of nodes which must be processed efficiently at large scale.},
doi = {10.1145/3152041.3152082},
journal = {Proceedings of the 3rd International IEEE Workshop on Extreme Scale Programming Models and Middleware},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {11}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Spatial Domain-Based Parallelism in Large-Scale, Participating-Media, Radiative Transport Applications
journal, June 1997

  • Burns, Shawn P.; Christen, Mark A.
  • Numerical Heat Transfer, Part B: Fundamentals, Vol. 31, Issue 4
  • DOI: 10.1080/10407799708915117

Efficient Methods for Handling Long-Range Forces in Particle–Particle Simulations
journal, August 2000

  • Fangohr, Hans; Price, Andrew R.; Cox, Simon J.
  • Journal of Computational Physics, Vol. 162, Issue 2
  • DOI: 10.1006/jcph.2000.6541

Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement
conference, May 2016

  • Humphrey, Alan; Sunderland, Daniel; Harman, Todd
  • 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2016.93

CHARM++: a portable concurrent object oriented system based on C++
journal, October 1993

  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • ACM SIGPLAN Notices, Vol. 28, Issue 10
  • DOI: 10.1145/167962.165874

Parallelization of the P-1 Radiation Model
journal, January 2006

  • Krishnamoorthy, Gautham; Rawat, Rajesh; Smith, Philip J.
  • Numerical Heat Transfer, Part B: Fundamentals, Vol. 49, Issue 1
  • DOI: 10.1080/10407790500344068

Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers
conference, January 2013

  • Meng, Qingyu; Humphrey, Alan; Schmidt, John
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503250

Dynamic task scheduling for the Uintah framework
conference, November 2010

  • Meng, Qingyu; Luitjens, Justin; Berzins, Martin
  • 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS)
  • DOI: 10.1109/MTAGS.2010.5699431

Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures
conference, January 2015

  • Peterson, Brad; Dasari, Harish; Humphrey, Alan
  • Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing - WOLFHPC '15
  • DOI: 10.1145/2830018.2830023

Large Scale Parallel Solution of Incompressible Flow Problems Using Uintah and Hypre
conference, May 2013

  • Schmidt, J.; Berzins, M.; Thornock, J.
  • 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
  • DOI: 10.1109/CCGrid.2013.10

The cosmological simulation code gadget-2
journal, December 2005