skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Acceleration of GPU-based Krylov solvers via data transfer reduction

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [1];  [1];  [2];  [3]
  1. Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.
  2. Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland)
  3. Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)

Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphics processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.

Research Organization:
Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); Russian Scientific Fund (Russian Federation)
Contributing Organization:
Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland); Univ. of Manchester (United Kingdom)
Grant/Contract Number:
SC0010042; ACI-1339822; N14-11-00190
OSTI ID:
1361293
Journal Information:
International Journal of High Performance Computing Applications, Vol. 29, Issue 3; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

References (24)

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems journal November 2007
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
  • Buluç, Aydin; Williams, Samuel; Oliker, Leonid
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.73
conference May 2011
CPU and GPU Performance of Large Scale Numerical Simulations in Geophysics book January 2014
Iterative Methods for Sparse Linear Systems book January 2003
Model-driven autotuning of sparse matrix-vector multiply on GPUs journal May 2010
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
  • Yamazaki, Ichitaro; Anzt, Hartwig; Tomov, Stanimire
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.48
conference May 2014
Model-driven autotuning of sparse matrix-vector multiply on GPUs
  • Choi, Jee W.; Singh, Amik; Vuduc, Richard W.
  • Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693471
conference January 2010
Accelerating scientific computations with mixed precision algorithms journal December 2009
Performance and Energy-Aware Characterization of the Sparse Matrix-Vector Multiplication on Multithreaded Architectures
  • Malossi, A. C. I.; Ineichen, Y.; Bekas, C.
  • 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops https://doi.org/10.1109/icppw.2014.30
conference September 2014
A Fan-In Algorithm for Distributed Sparse Numerical Factorization
  • Ashcraft, Cleve; Eisenstat, Stanley C.; Liu, Joseph W. H.
  • SIAM Journal on Scientific and Statistical Computing, Vol. 11, Issue 3 https://doi.org/10.1137/0911033
journal May 1990
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures book January 2010
GPU-accelerated preconditioned iterative linear solvers journal October 2012
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems journal March 1992
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods book January 1994
Optimizing Krylov Subspace Solvers on Graphics Processing Units conference May 2014
Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs conference October 2013
Methods of conjugate gradients for solving linear systems journal December 1952
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers
  • Anzt, Hartwig; Tomov, Stanimire; Dongarra, Jack
  • Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15 https://doi.org/10.1145/2712386.2712387
conference January 2015
Finite elements book April 2020
Finite Elements book January 2008
Finite Elements book January 2008
Sparse Matrix-Vector Multiplication on Multicore and Accelerators book December 2010
Finite Elements book January 1980
Accelerating Scientific Computations with Mixed Precision Algorithms text January 2008

Cited By (1)

A review of CUDA optimization techniques and tools for structured grid computing journal July 2019

Similar Records

Batched matrix computations on hardware accelerators based on GPUs
Journal Article · Mon Feb 09 00:00:00 EST 2015 · International Journal of High Performance Computing Applications · OSTI ID:1361293

Performance Portable Batched Sparse Linear Solvers
Journal Article · Mon May 01 00:00:00 EDT 2023 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1361293

A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors
Journal Article · Tue Mar 15 00:00:00 EDT 2016 · Journal of Computational Physics · OSTI ID:1361293