skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing the hypre solver for manycore and GPU architectures

Journal Article · · Journal of Computational Science
 [1];  [2];  [2];  [1]
  1. Univ. of Utah, Salt Lake City, UT (United States). SCI Inst.
  2. Univ. of California, Irvine, CA (United States). EECS

The solution of large-scale combustion problems with codes such as Uintah on modern computer architectures requires the use of multithreading and GPUs to achieve performance. Uintah uses a low-Mach number approximation that requires iteratively solving a large system of linear equations. The Hypre iterative solver has solved such systems in a scalable way for Uintah, but the use of OpenMP with Hypre leads to at least slowdown due to OpenMP overheads. The proposed solution uses the MPI Endpoints within Hypre, where each team of threads acts as a different MPI rank. This approach minimizes OpenMP synchronization overhead and performs as fast or (up to 1.44) faster than Hypre's MPI-only version, and allows the rest of Uintah to be optimized using OpenMP. The profiling of the GPU version of Hypre shows the bottleneck to be the launch overhead of thousands of micro-kernels. The GPU performance was improved by fusing these micro-kernels and was further optimized by using Cuda-aware MPI, resulting in an overall speedup of 1.16—1.44 compared to the baseline GPU implementation. The above optimization strategies were published in the International Conference on Computational Science 2020 [1]. This work extends the previously published research by carrying out the second phase of communication-centered optimizations in Hypre to improve its scalability on large-scale supercomputers. Additionally, this includes an efficient non-blocking inter-thread communication scheme, communication-reducing patch assignment, and expression of logical communication parallelism to a new version of the MPICH library that utilizes the underlying network parallelism [2]. The above optimizations avoid communication bottlenecks previously observed during strong scaling and improve performance by up to 2 on 256 nodes of Intel Knight's Landing processor.

Research Organization:
Univ. of Utah, Salt Lake City, UT (United States); Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
NA0002375; AC02-06CH11357
OSTI ID:
1850315
Alternate ID(s):
OSTI ID: 1780319
Journal Information:
Journal of Computational Science, Vol. 49, Issue C; ISSN 1877-7503
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (12)

Large Scale Parallel Solution of Incompressible Flow Problems Using Uintah and Hypre
  • Schmidt, J.; Berzins, M.; Thornock, J.
  • 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing https://doi.org/10.1109/CCGrid.2013.10
conference May 2013
Demonstrating GPU code portability and scalability for radiative heat transfer computations journal July 2018
Extending the Uintah Framework through the Petascale Modeling of Detonation in Arrays of High Explosive Devices journal January 2016
Enabling MPI interoperability through flexible communication endpoints conference January 2013
Pursuing scalability for hypre 's conceptual interfaces journal September 2005
Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs conference September 2019
Scaling Hypre’s Multigrid Solvers to 100,000 Cores book January 2012
Enabling communication concurrency through flexible MPI endpoints journal September 2014
An Evaluation of An Asynchronous Task Based Dataflow Approach For Uintah conference July 2019
Scalable Communication Endpoints for MPI+Threads Applications conference December 2018
Communication Avoiding Multigrid Preconditioned Conjugate Gradient Method for Extreme Scale Multiphase CFD Simulations conference November 2018
Modeling the Performance of an Algebraic Multigrid Cycle Using Hybrid MPI/OpenMP conference September 2012

Similar Records

Deploy threading in Nalu solver stack
Technical Report · Mon Oct 01 00:00:00 EDT 2018 · OSTI ID:1850315

Quantum Monte Carlo Endstation for Petascale Computing
Technical Report · Wed Mar 02 00:00:00 EST 2011 · OSTI ID:1850315

Petascale Computing Enabling Technologies Project Final Report
Technical Report · Sun Feb 14 00:00:00 EST 2010 · OSTI ID:1850315