Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs

Conference · · Proceedings of the 3rd International IEEE Workshop on Extreme Scale Programming Models and Middleware
 [1];  [2];  [2];  [2]
  1. Univ. of Utah, Salt Lake City, UT (United States); University of Utah
  2. Univ. of Utah, Salt Lake City, UT (United States)

Large-scale parallel applications with complex global data dependencies beyond those of reductions pose significant scalability challenges in an asynchronous runtime system. Internodal challenges include identifying the all-to-all communication of data dependencies among the nodes. Intranodal challenges include gathering together these data dependencies into usable data objects while avoiding data duplication. This paper addresses these challenges within the context of a large-scale, industrial coal boiler simulation using the Uintah asynchronous many-task runtime system on GPU architectures. We show significant reduction in time spent analyzing data dependencies through refinements in our dependency search algorithm. Multiple task graphs are used to eliminate subsequent analysis when task graphs change in predictable and repeatable ways. Using a combined data store and task scheduler redesign reduces data dependency duplication ensuring that problems fit within host and GPU memory. Furthermore, these modifications did not require any changes to application code or sweeping changes to the Uintah runtime system. We report results running on the DOE Titan system on 119K CPU cores and 7.5K GPUs simultaneously. Our solutions can be generalized to other task dependency problems with global dependencies among thousands of nodes which must be processed efficiently at large scale.

Research Organization:
Univ. of Utah, Salt Lake City, UT (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)
DOE Contract Number:
NA0002375; AC05-00OR22725
OSTI ID:
1582428
Journal Information:
Proceedings of the 3rd International IEEE Workshop on Extreme Scale Programming Models and Middleware, Journal Name: Proceedings of the 3rd International IEEE Workshop on Extreme Scale Programming Models and Middleware
Country of Publication:
United States
Language:
English

References (13)

Dynamic task scheduling for the Uintah framework conference November 2010
Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement conference May 2016
The uintah framework: a unified heterogeneous task scheduling and runtime system conference November 2012
The cosmological simulation code gadget-2 journal December 2005
Parallelization of the P-1 Radiation Model journal January 2006
A Scalable Algorithm for Radiative Heat Transfer Using Reverse Monte Carlo Ray Tracing book January 2015
The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners book January 2006
Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers
  • Meng, Qingyu; Humphrey, Alan; Schmidt, John
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503250
conference January 2013
Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures
  • Peterson, Brad; Dasari, Harish; Humphrey, Alan
  • Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing - WOLFHPC '15 https://doi.org/10.1145/2830018.2830023
conference January 2015
CHARM++: a portable concurrent object oriented system based on C++ journal October 1993
Efficient Methods for Handling Long-Range Forces in Particle–Particle Simulations journal August 2000
Spatial Domain-Based Parallelism in Large-Scale, Participating-Media, Radiative Transport Applications journal June 1997
Large Scale Parallel Solution of Incompressible Flow Problems Using Uintah and Hypre
  • Schmidt, J.; Berzins, M.; Thornock, J.
  • 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing https://doi.org/10.1109/CCGrid.2013.10
conference May 2013

Similar Records

Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime
Journal Article · Thu Dec 06 23:00:00 EST 2018 · International Journal of Parallel Programming · OSTI ID:1567537

The uintah framework: a unified heterogeneous task scheduling and runtime system
Conference · Thu Nov 01 00:00:00 EDT 2012 · 2012 SC Companion: High Performance Computing, Networking Storage and Analysis; 10-16 Nov. 2012; Salt Lake City, UT, USA · OSTI ID:1567606

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: XSEDE '12 Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, Article No. 4
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1567625