skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime

Journal Article · · International Journal of Parallel Programming

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
1337135; NA0002375; CSC188; AC05-00OR22725
OSTI ID:
1567537
Journal Information:
International Journal of Parallel Programming, Vol. 47, Issue 5-6; ISSN 0885-7458
Publisher:
Springer
Country of Publication:
United States
Language:
English

References (22)

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
  • Humphrey, Alan; Meng, Qingyu; Berzins, Martin
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12 https://doi.org/10.1145/2335755.2335791
conference January 2012
Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures
  • Peterson, Brad; Dasari, Harish; Humphrey, Alan
  • Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing - WOLFHPC '15 https://doi.org/10.1145/2830018.2830023
conference January 2015
Wasatch: An architecture-proof multiphysics development environment using a Domain Specific Language and graph theory journal November 2016
Using hybrid parallelism to improve memory use in the Uintah framework conference January 2011
Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers
  • Meng, Qingyu; Humphrey, Alan; Schmidt, John
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503250
conference January 2013
Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs
  • Peterson, Brad; Humphrey, Alan; Schmidt, John
  • Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware - ESPM2'17 https://doi.org/10.1145/3152041.3152082
conference January 2017
CHARM++: a portable concurrent object oriented system based on C++ journal October 1993
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
DAGuE: A generic distributed DAG engine for High Performance Computing journal January 2012
Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement conference May 2016
Extending the Uintah Framework through the Petascale Modeling of Detonation in Arrays of High Explosive Devices journal January 2016
Kokkos Array performance-portable manycore programming model
  • Edwards, H. Carter; Sunderland, Daniel
  • Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12 https://doi.org/10.1145/2141702.2141703
conference January 2012
An investigation of Unified Memory Access performance in CUDA conference September 2014
Graph-Based Software Design for Managing Complexity and Enabling Concurrency in Multiphysics PDE Software journal November 2012
Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations journal March 2017
The Discrete Operator Approach to the Numerical Solution of Partial Differential Equations conference June 2012
GPU-Aware Non-contiguous Data Movement In Open MPI
  • Wu, Wei; Bosilca, George; vandeVaart, Rolf
  • Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing - HPDC '16 https://doi.org/10.1145/2907294.2907317
conference January 2016
Spatial Domain-Based Parallelism in Large-Scale, Participating-Media, Radiative Transport Applications journal June 1997
Regent: a high-productivity programming language for HPC with logical regions
  • Slaughter, Elliott; Lee, Wonchan; Treichler, Sean
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807629
conference January 2015
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework conference September 2011
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks conference May 2017
PTG: An Abstraction for Unhindered Parallelism
  • Danalis, Anthony; Bosilca, George; Bouteiller, Aurelien
  • 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) https://doi.org/10.1109/WOLFHPC.2014.8
conference November 2014