skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime

Abstract

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime systemmore » overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.« less

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC); USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1567537
DOE Contract Number:  
1337135; NA0002375; CSC188; AC05-00OR22725
Resource Type:
Journal Article
Journal Name:
International Journal of Parallel Programming
Additional Journal Information:
Journal Volume: 47; Journal Issue: 5-6; Journal ID: ISSN 0885-7458
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
Uintah; Hybrid parallelism; Parallel; GPU; Heterogeneous systems; Stencil computation; Optimization; Concurrency; Halo transfer

Citation Formats

Peterson, Brad, Humphrey, Alan, Sunderland, Dan, Sutherland, James, Saad, Tony, Dasari, Harish, and Berzins, Martin. Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime. United States: N. p., 2018. Web. doi:10.1007/s10766-018-0619-1.
Peterson, Brad, Humphrey, Alan, Sunderland, Dan, Sutherland, James, Saad, Tony, Dasari, Harish, & Berzins, Martin. Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime. United States. doi:10.1007/s10766-018-0619-1.
Peterson, Brad, Humphrey, Alan, Sunderland, Dan, Sutherland, James, Saad, Tony, Dasari, Harish, and Berzins, Martin. Fri . "Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime". United States. doi:10.1007/s10766-018-0619-1.
@article{osti_1567537,
title = {Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime},
author = {Peterson, Brad and Humphrey, Alan and Sunderland, Dan and Sutherland, James and Saad, Tony and Dasari, Harish and Berzins, Martin},
abstractNote = {The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.},
doi = {10.1007/s10766-018-0619-1},
journal = {International Journal of Parallel Programming},
issn = {0885-7458},
number = 5-6,
volume = 47,
place = {United States},
year = {2018},
month = {12}
}

Works referenced in this record:

Wasatch: An architecture-proof multiphysics development environment using a Domain Specific Language and graph theory
journal, November 2016


CHARM++: a portable concurrent object oriented system based on C++
journal, October 1993

  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • ACM SIGPLAN Notices, Vol. 28, Issue 10
  • DOI: 10.1145/167962.165874

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
journal, November 2010

  • Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond
  • Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2
  • DOI: 10.1002/cpe.1631

DAGuE: A generic distributed DAG engine for High Performance Computing
journal, January 2012


Extending the Uintah Framework through the Petascale Modeling of Detonation in Arrays of High Explosive Devices
journal, January 2016

  • Berzins, Martin; Beckvermit, Jacqueline; Harman, Todd
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 5
  • DOI: 10.1137/15M1023270

Graph-Based Software Design for Managing Complexity and Enabling Concurrency in Multiphysics PDE Software
journal, November 2012

  • Notz, Patrick K.; Pawlowski, Roger P.; Sutherland, James C.
  • ACM Transactions on Mathematical Software, Vol. 39, Issue 1
  • DOI: 10.1145/2382585.2382586

Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations
journal, March 2017

  • Earl, Christopher; Might, Matthew; Bagusetty, Abhishek
  • Journal of Systems and Software, Vol. 125
  • DOI: 10.1016/j.jss.2016.01.023

Spatial Domain-Based Parallelism in Large-Scale, Participating-Media, Radiative Transport Applications
journal, June 1997

  • Burns, Shawn P.; Christen, Mark A.
  • Numerical Heat Transfer, Part B: Fundamentals, Vol. 31, Issue 4
  • DOI: 10.1080/10407799708915117