Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs
- NVIDIA Corporation, Santa Clara, CA (United States)
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Univ. of Utah, Salt Lake City, UT (United States). Dept. of Physics and Astronomy
- NVIDIA GmbH, Würselen (Germany)
- Boston Univ., MA (United States). Dept. of Physics
Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. Here, using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP); USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- AC02-07CH11359
- OSTI ID:
- 1418147
- Alternate ID(s):
- OSTI ID: 1734408
- Report Number(s):
- arXiv:1710.09745; FERMILAB-PUB-17-592-CD; 1632766
- Journal Information:
- Computer Physics Communications, Vol. 233, Issue C; ISSN 0010-4655
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Status and future perspectives for lattice gauge theory calculations to the exascale and beyond
|
journal | November 2019 |
Similar Records
Acceleration of GPU-based Krylov solvers via data transfer reduction
Tensor Contraction and Operation Minimization forExtreme Scale Computational Chemistry