Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime
The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- 1337135; NA0002375; CSC188; AC05-00OR22725
- OSTI ID:
- 1567537
- Journal Information:
- International Journal of Parallel Programming, Vol. 47, Issue 5-6; ISSN 0885-7458
- Publisher:
- Springer
- Country of Publication:
- United States
- Language:
- English
Similar Records
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: XSEDE '12 Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, Article No. 4
Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers