Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimizing the hypre solver for manycore and GPU architectures

Journal Article · · Journal of Computational Science
 [1];  [2];  [2];  [3]
  1. Univ. of Utah, Salt Lake City, UT (United States). SCI Inst.; Univ. of Utah, Salt Lake City, UT (United States)
  2. Univ. of California, Irvine, CA (United States). EECS
  3. Univ. of Utah, Salt Lake City, UT (United States). SCI Inst.

The solution of large-scale combustion problems with codes such as Uintah on modern computer architectures requires the use of multithreading and GPUs to achieve performance. Uintah uses a low-Mach number approximation that requires iteratively solving a large system of linear equations. The Hypre iterative solver has solved such systems in a scalable way for Uintah, but the use of OpenMP with Hypre leads to at least slowdown due to OpenMP overheads. The proposed solution uses the MPI Endpoints within Hypre, where each team of threads acts as a different MPI rank. This approach minimizes OpenMP synchronization overhead and performs as fast or (up to 1.44) faster than Hypre's MPI-only version, and allows the rest of Uintah to be optimized using OpenMP. The profiling of the GPU version of Hypre shows the bottleneck to be the launch overhead of thousands of micro-kernels. The GPU performance was improved by fusing these micro-kernels and was further optimized by using Cuda-aware MPI, resulting in an overall speedup of 1.16—1.44 compared to the baseline GPU implementation. The above optimization strategies were published in the International Conference on Computational Science 2020 [1]. This work extends the previously published research by carrying out the second phase of communication-centered optimizations in Hypre to improve its scalability on large-scale supercomputers. Additionally, this includes an efficient non-blocking inter-thread communication scheme, communication-reducing patch assignment, and expression of logical communication parallelism to a new version of the MPICH library that utilizes the underlying network parallelism [2]. The above optimizations avoid communication bottlenecks previously observed during strong scaling and improve performance by up to 2 on 256 nodes of Intel Knight's Landing processor.

Research Organization:
Univ. of Utah, Salt Lake City, UT (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
NA0002375; AC02-06CH11357
OSTI ID:
1850315
Alternate ID(s):
OSTI ID: 1780319
Journal Information:
Journal of Computational Science, Journal Name: Journal of Computational Science Journal Issue: C Vol. 49; ISSN 1877-7503
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (12)

Scaling Hypre’s Multigrid Solvers to 100,000 Cores book January 2012
Demonstrating GPU code portability and scalability for radiative heat transfer computations journal July 2018
Large Scale Parallel Solution of Incompressible Flow Problems Using Uintah and Hypre
  • Schmidt, J.; Berzins, M.; Thornock, J.
  • 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing https://doi.org/10.1109/CCGrid.2013.10
conference May 2013
Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs conference September 2019
An Evaluation of An Asynchronous Task Based Dataflow Approach For Uintah conference July 2019
Modeling the Performance of an Algebraic Multigrid Cycle Using Hybrid MPI/OpenMP conference September 2012
Scalable Communication Endpoints for MPI+Threads Applications conference December 2018
Communication Avoiding Multigrid Preconditioned Conjugate Gradient Method for Extreme Scale Multiphase CFD Simulations conference November 2018
Extending the Uintah Framework through the Petascale Modeling of Detonation in Arrays of High Explosive Devices journal January 2016
Pursuing scalability for hypre 's conceptual interfaces journal September 2005
Enabling MPI interoperability through flexible communication endpoints conference January 2013
Enabling communication concurrency through flexible MPI endpoints journal September 2014

Similar Records

Deploy threading in Nalu solver stack
Technical Report · Mon Oct 01 00:00:00 EDT 2018 · OSTI ID:1481562

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems
Conference · Fri Dec 31 23:00:00 EST 2010 · OSTI ID:1407109

Designing and prototyping extensions to the Message Passing Interface in MPICH
Journal Article · Sun Aug 18 20:00:00 EDT 2024 · International Journal of High Performance Computing Applications · OSTI ID:2571429