Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

Journal Article · · Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences
 [1];  [1];  [2];  [3];  [4]
  1. NVIDIA, Santa Clara, CA (United States)
  2. Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science
  3. Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division; Univ. of Manchester (United Kingdom). Dept. of Mathematics
  4. Univ. of Manchester (United Kingdom). Dept. of Mathematics
Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax = b without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4×-5× performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
OSTI ID:
1787013
Journal Information:
Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences, Journal Name: Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences Journal Issue: 2243 Vol. 476; ISSN 1364-5021
Publisher:
The Royal Society PublishingCopyright Statement
Country of Publication:
United States
Language:
English

References (25)

Iterative refinement enhances the stability ofQR factorization methods for solving linear equations journal September 1991
Accelerating scientific computations with mixed precision algorithms journal December 2009
Towards dense linear algebra for hybrid GPU accelerated manycore systems journal June 2010
Iterative refinement implies numerical stability for Gaussian elimination journal September 1980
Iterative refinement for linear systems and LAPACK journal October 1997
Dense linear algebra solvers for multicore with GPU accelerators conference April 2010
Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) conference November 2006
Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers conference November 2018
Solving Sparse Linear Systems with Sparse Backward Error journal April 1989
The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators journal June 1980
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems journal July 1986
A Flexible Inner-Outer Preconditioned GMRES Algorithm journal March 1993
Accuracy and Stability of Numerical Algorithms book January 2002
LAPACK Users' Guide software January 1999
Applied Numerical Linear Algebra book January 1997
A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems journal January 2017
Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions journal January 2018
Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems journal January 2019
Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores journal January 2020
Flexible Inner-Outer Krylov Subspace Methods journal January 2002
Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration journal January 1999
The university of Florida sparse matrix collection journal November 2011
Investigating half precision arithmetic to accelerate dense linear system solvers
  • Haidar, Azzam; Wu, Panruo; Tomov, Stanimire
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems https://doi.org/10.1145/3148226.3148237
conference November 2017
Iterative Refinement in Floating Point journal April 1967
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems journal November 2007

Similar Records

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
Conference · Tue Oct 08 00:00:00 EDT 2024 · OSTI ID:2539787

A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates
Journal Article · Sun Jan 28 19:00:00 EST 2024 · Engineering Analysis with Boundary Elements · OSTI ID:2283916

Analyzing Deep Learning Model Inferences for Image Classification using OpenVINO
Conference · Tue Dec 31 23:00:00 EST 2019 · OSTI ID:1804060