Acceleration of GPU-based Krylov solvers via data transfer reduction
Abstract
Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphics processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressingmore »
- Authors:
-
- Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.
- Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland)
- Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)
- Publication Date:
- Research Org.:
- Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); Russian Scientific Fund (Russian Federation)
- Contributing Org.:
- Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland); Univ. of Manchester (United Kingdom)
- OSTI Identifier:
- 1361293
- Grant/Contract Number:
- SC0010042; ACI-1339822; N14-11-00190
- Resource Type:
- Accepted Manuscript
- Journal Name:
- International Journal of High Performance Computing Applications
- Additional Journal Information:
- Journal Volume: 29; Journal Issue: 3; Journal ID: ISSN 1094-3420
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Krylov Subspace Methods; Iterative Solvers; Sparse Linear Systems; Graphics Processing Units; BiCGSTAB
Citation Formats
Anzt, Hartwig, Tomov, Stanimire, Luszczek, Piotr, Sawyer, William, and Dongarra, Jack. Acceleration of GPU-based Krylov solvers via data transfer reduction. United States: N. p., 2015.
Web. doi:10.1177/1094342015580139.
Anzt, Hartwig, Tomov, Stanimire, Luszczek, Piotr, Sawyer, William, & Dongarra, Jack. Acceleration of GPU-based Krylov solvers via data transfer reduction. United States. https://doi.org/10.1177/1094342015580139
Anzt, Hartwig, Tomov, Stanimire, Luszczek, Piotr, Sawyer, William, and Dongarra, Jack. Wed .
"Acceleration of GPU-based Krylov solvers via data transfer reduction". United States. https://doi.org/10.1177/1094342015580139. https://www.osti.gov/servlets/purl/1361293.
@article{osti_1361293,
title = {Acceleration of GPU-based Krylov solvers via data transfer reduction},
author = {Anzt, Hartwig and Tomov, Stanimire and Luszczek, Piotr and Sawyer, William and Dongarra, Jack},
abstractNote = {Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphics processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.},
doi = {10.1177/1094342015580139},
journal = {International Journal of High Performance Computing Applications},
number = 3,
volume = 29,
place = {United States},
year = {Wed Apr 08 00:00:00 EDT 2015},
month = {Wed Apr 08 00:00:00 EDT 2015}
}
Web of Science
Works referenced in this record:
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems
journal, November 2007
- Buttari, Alfredo; Dongarra, Jack; Langou, Julie
- The International Journal of High Performance Computing Applications, Vol. 21, Issue 4
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
conference, May 2011
- Buluç, Aydin; Williams, Samuel; Oliker, Leonid
- Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium
CPU and GPU Performance of Large Scale Numerical Simulations in Geophysics
book, January 2014
- Dorostkar, Ali; Lukarski, Dimitar; Lund, Björn
- Lecture Notes in Computer Science
Model-driven autotuning of sparse matrix-vector multiply on GPUs
journal, May 2010
- Choi, Jee W.; Singh, Amik; Vuduc, Richard W.
- ACM SIGPLAN Notices, Vol. 45, Issue 5
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
conference, May 2014
- Yamazaki, Ichitaro; Anzt, Hartwig; Tomov, Stanimire
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
Model-driven autotuning of sparse matrix-vector multiply on GPUs
conference, January 2010
- Choi, Jee W.; Singh, Amik; Vuduc, Richard W.
- Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10
Accelerating scientific computations with mixed precision algorithms
journal, December 2009
- Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack
- Computer Physics Communications, Vol. 180, Issue 12
Performance and Energy-Aware Characterization of the Sparse Matrix-Vector Multiplication on Multithreaded Architectures
conference, September 2014
- Malossi, A. C. I.; Ineichen, Y.; Bekas, C.
- 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops
A Fan-In Algorithm for Distributed Sparse Numerical Factorization
journal, May 1990
- Ashcraft, Cleve; Eisenstat, Stanley C.; Liu, Joseph W. H.
- SIAM Journal on Scientific and Statistical Computing, Vol. 11, Issue 3
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
book, January 2010
- Monakov, Alexander; Lokhmotov, Anton; Avetisyan, Arutyun
- High Performance Embedded Architectures and Compilers
GPU-accelerated preconditioned iterative linear solvers
journal, October 2012
- Li, Ruipeng; Saad, Yousef
- The Journal of Supercomputing, Vol. 63, Issue 2
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
journal, March 1992
- van der Vorst, H. A.
- SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
book, January 1994
- Barrett, Richard; Berry, Michael; Chan, Tony F.
Optimizing Krylov Subspace Solvers on Graphics Processing Units
conference, May 2014
- Anzt, Hartwig; Sawyer, William; Tomov, Stanimire
- 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs
conference, October 2013
- Aliaga, Jose I.; Perez, Joaquin; Quintana-Orti, Enrique S.
- 2013 42nd International Conference on Parallel Processing (ICPP)
Methods of conjugate gradients for solving linear systems
journal, December 1952
- Hestenes, M. R.; Stiefel, E.
- Journal of Research of the National Bureau of Standards, Vol. 49, Issue 6
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers
conference, January 2015
- Anzt, Hartwig; Tomov, Stanimire; Dongarra, Jack
- Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15
Finite Elements
book, January 2008
- ,
- Numerical Approximation Methods for Elliptic Boundary Value Problems
Sparse Matrix-Vector Multiplication on Multicore and Accelerators
book, December 2010
- Williams, Samuel; Bell, Nathan; Choi, Jee Whan
- Scientific Computing with Multicore and Accelerators
Finite Elements
book, January 1980
- Pian, Theodore H. H.
- Variational Methods in the Mechanics of Solids
Accelerating Scientific Computations with Mixed Precision Algorithms
text, January 2008
- Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack
- arXiv
Works referencing / citing this record:
A review of CUDA optimization techniques and tools for structured grid computing
journal, July 2019
- Al-Mouhamed, Mayez A.; Khan, Ayaz H.; Mohammad, Nazeeruddin
- Computing, Vol. 102, Issue 4