Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimal size of the block in block GMRES on GPUs: computational model and experiments

Journal Article · · Numerical Algorithms
 [1];  [2];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research
  2. Temple Univ., Philadelphia, PA (United States)

The block version of GMRES (BGMRES) is most advantageous over the single right hand side (RHS) counterpart when the cost of communication is high while the cost of floating point operations is not. This is the particular case on modern graphics processing units (GPUs), while it is generally not the case on traditional central processing units (CPUs). Here, in this paper, experiments on both GPUs and CPUs are shown that compare the performance of BGMRES against GMRES as the number of RHS increases, with a particular focus on GPU performance. The experiments indicate that there are many cases in which BGMRES is slower than GMRES on CPUs, but faster on GPUs. Furthermore, when varying the number of RHS on the GPU, there is an optimal number of RHS where BGMRES is clearly most advantageous over GMRES. A computational model for the GPU is developed using hardware specific parameters, providing insight towards how the qualitative behavior of BGMRES changes as the number of RHS increase, and this model also helps explain the phenomena observed in the experiments.

Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
NA0003525
OSTI ID:
2311786
Report Number(s):
SAND--2023-10797J
Journal Information:
Numerical Algorithms, Journal Name: Numerical Algorithms Vol. 92; ISSN 1017-1398
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (16)

The block conjugate gradient algorithm and related methods journal February 1980
Convergence properties of block GMRES and matrix polynomials journal November 1996
Updating the QR decomposition of block tridiagonal and block Hessenberg matrices journal June 2008
On short recurrence Krylov type methods for linear systems with many right-hand sides journal July 2016
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
OpenMP: an industry standard API for shared-memory programming journal January 1998
Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives conference May 2017
PPT-GPU: Scalable GPU Performance Modeling journal January 2019
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems journal July 1986
An Iterative Method for Nonsymmetric Systems with Multiple Right-Hand Sides journal July 1995
Iterative Methods for Sparse Linear Systems book January 2003
Block Krylov Subspace Recycling for Shifted Systems with Unrelated Right-Hand Sides journal January 2016
The Stability of Block Variants of Classical Gram--Schmidt journal January 2021
The university of Florida sparse matrix collection journal November 2011
An updated set of basic linear algebra subprograms (BLAS) journal June 2002
A set of level 3 basic linear algebra subprograms journal March 1990

Similar Records

Accelerating solidification process simulation for large-sized system of liquid metal atoms using GPU with CUDA
Journal Article · Tue Jan 14 23:00:00 EST 2014 · Journal of Computational Physics · OSTI ID:22230847

Compressed basis GMRES on high-performance graphics processing units
Journal Article · Fri Aug 05 00:00:00 EDT 2022 · International Journal of High Performance Computing Applications · OSTI ID:2424930

Porting the WAVEWATCH III (v6.07) wave action source terms to GPU
Journal Article · Thu Mar 02 23:00:00 EST 2023 · Geoscientific Model Development (Online) · OSTI ID:1959840