skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs

Journal Article · · Computer Physics Communications
 [1];  [2]; ORCiD logo [3];  [4];  [5]
  1. NVIDIA Corporation, Santa Clara, CA (United States)
  2. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  3. Univ. of Utah, Salt Lake City, UT (United States). Dept. of Physics and Astronomy
  4. NVIDIA GmbH, Würselen (Germany)
  5. Boston Univ., MA (United States). Dept. of Physics

Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. Here, using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.

Research Organization:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-07CH11359
OSTI ID:
1418147
Alternate ID(s):
OSTI ID: 1734408
Report Number(s):
arXiv:1710.09745; FERMILAB-PUB-17-592-CD; 1632766
Journal Information:
Computer Physics Communications, Vol. 233, Issue C; ISSN 0010-4655
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

References (34)

Variable Block CG Algorithms for Solving Large Sparse Symmetric Positive Definite Linear Systems on Parallel Computers, I: General Iterative Scheme journal October 1995
Solving lattice QCD systems of equations using mixed precision solvers on GPUs journal September 2010
Multiple right-hand side techniques for the numerical simulation of quasistatic electric and magnetic fields journal June 2008
A review of block Krylov subspace methods for multisource electromagnetic modelling journal June 2015
Block Krylov Recycling Algorithms for FETI-2LM Applied to 3-D Electromagnetic Wave Scattering and Radiation journal April 2017
Computing and Deflating Eigenvalues While Solving Multiple Right-Hand Side Linear Systems with an Application to Quantum Chromodynamics journal January 2010
Adaptive Multigrid Algorithm for Lattice QCD journal January 2008
Adaptive Multigrid Algorithm for the Lattice Wilson-Dirac Operator journal November 2010
Local coherence and deflation of the low quark modes in lattice QCD journal July 2007
Flexible Variants of Block Restarted GMRES Methods with Application to Geophysics journal January 2012
A breakdown-free block conjugate gradient method journal October 2016
Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals journal January 2000
Lattice QCD as a video game journal October 2007
Efficient Implementation of the Overlap Operator on Multi-GPUs conference July 2011
The Chroma Software System for Lattice QCD journal March 2005
A Framework for Lattice QCD Calculations on GPUs
  • Winter, F. T.; Clark, M. A.; Edwards, R. G.
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.112
conference May 2014
The block conjugate gradient algorithm and related methods journal February 1980
Application of block Krylov subspace algorithms to the Wilson–Dirac equation with multiple right-hand sides in lattice QCD journal January 2010
Application of preconditioned block BiCGGR to the Wilson–Dirac equation with multiple right-hand sides in lattice QCD journal May 2010
Modified block BiCGSTAB for lattice QCD journal January 2012
A deflated conjugate gradient method for multiple right hand sides and multiple shifts journal November 2013
The QCD finite temperature transition and hybrid Monte Carlo journal February 1989
Hamiltonian formulation of Wilson's lattice gauge theories journal January 1975
Further Improvements to staggered quarks journal March 2004
Methods of conjugate gradients for solving linear systems journal December 1952
Roundoff error analysis of the CholeskyQR2 algorithm in an oblique inner product journal January 2016
Reliable updated residuals in hybrid Bi-CG methods journal June 1996
Effective noise reduction techniques for disconnected loops in Lattice QCD journal September 2010
Block s-step Krylov iterative methods journal January 2010
Amesos2 and Belos: Direct and Iterative Solvers for Large Sparse Linear Systems journal January 2012
An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors text January 2014
Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs. text January 2015
Application of block Krylov subspace algorithms to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD text January 2009
Application of preconditioned block BiCGGR to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD text January 2009

Cited By (1)

Status and future perspectives for lattice gauge theory calculations to the exascale and beyond journal November 2019

Similar Records

Physics-based preconditioning and the Newton-Krylov method for non-equilibrium radiation diffusion
Journal Article · Sat May 20 00:00:00 EDT 2000 · Journal of Computational Physics · OSTI ID:1418147

Acceleration of GPU-based Krylov solvers via data transfer reduction
Journal Article · Wed Apr 08 00:00:00 EDT 2015 · International Journal of High Performance Computing Applications · OSTI ID:1418147

Tensor Contraction and Operation Minimization forExtreme Scale Computational Chemistry
Technical Report · Wed Feb 17 00:00:00 EST 2021 · OSTI ID:1418147