Acceleration of GPU-based Krylov solvers via data transfer reduction

Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; Sawyer, William; Dongarra, Jack

doi:10.1177/1094342015580139

Title: Acceleration of GPU-based Krylov solvers via data transfer reduction

Journal Article · Wed Apr 08 00:00:00 EDT 2015 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/1094342015580139· OSTI ID:1361293

Anzt, Hartwig ^[1]; Tomov, Stanimire ^[1]; Luszczek, Piotr ^[1]; Sawyer, William ^[2]; Dongarra, Jack ^[3]

Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.
Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland)
Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)

Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphics processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); Russian Scientific Fund (Russian Federation)

Contributing Organization:: Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland); Univ. of Manchester (United Kingdom)

Grant/Contract Number:: SC0010042; ACI-1339822; N14-11-00190

OSTI ID:: 1361293

Journal Information:: International Journal of High Performance Computing Applications, Vol. 29, Issue 3; ISSN 1094-3420

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 11 works

Citation information provided by
Web of Science

References (24)

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems Buttari, Alfredo; Dongarra, Jack; Langou, Julie The International Journal of High Performance Computing Applications, Vol. 21, Issue 4 https://doi.org/10.1177/1094342007084026	journal	November 2007
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication Buluç, Aydin; Williams, Samuel; Oliker, Leonid Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.73	conference	May 2011
CPU and GPU Performance of Large Scale Numerical Simulations in Geophysics Dorostkar, Ali; Lukarski, Dimitar; Lund, Björn Lecture Notes in Computer Science https://doi.org/10.1007/978-3-319-14325-5_2	book	January 2014
Iterative Methods for Sparse Linear Systems Saad, Yousef https://doi.org/10.1137/1.9780898718003	book	January 2003
Model-driven autotuning of sparse matrix-vector multiply on GPUs Choi, Jee W.; Singh, Amik; Vuduc, Richard W. ACM SIGPLAN Notices, Vol. 45, Issue 5 https://doi.org/10.1145/1837853.1693471	journal	May 2010
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs Yamazaki, Ichitaro; Anzt, Hartwig; Tomov, Stanimire 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.48	conference	May 2014
Model-driven autotuning of sparse matrix-vector multiply on GPUs Choi, Jee W.; Singh, Amik; Vuduc, Richard W. Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693471	conference	January 2010
Accelerating scientific computations with mixed precision algorithms Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack Computer Physics Communications, Vol. 180, Issue 12 https://doi.org/10.1016/j.cpc.2008.11.005	journal	December 2009
Performance and Energy-Aware Characterization of the Sparse Matrix-Vector Multiplication on Multithreaded Architectures Malossi, A. C. I.; Ineichen, Y.; Bekas, C. 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops https://doi.org/10.1109/icppw.2014.30	conference	September 2014
A Fan-In Algorithm for Distributed Sparse Numerical Factorization Ashcraft, Cleve; Eisenstat, Stanley C.; Liu, Joseph W. H. SIAM Journal on Scientific and Statistical Computing, Vol. 11, Issue 3 https://doi.org/10.1137/0911033	journal	May 1990
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures Monakov, Alexander; Lokhmotov, Anton; Avetisyan, Arutyun High Performance Embedded Architectures and Compilers https://doi.org/10.1007/978-3-642-11515-8_10	book	January 2010
GPU-accelerated preconditioned iterative linear solvers Li, Ruipeng; Saad, Yousef The Journal of Supercomputing, Vol. 63, Issue 2 https://doi.org/10.1007/s11227-012-0825-3	journal	October 2012
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems van der Vorst, H. A. SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2 https://doi.org/10.1137/0913035	journal	March 1992
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods Barrett, Richard; Berry, Michael; Chan, Tony F. https://doi.org/10.1137/1.9781611971538	book	January 1994
Optimizing Krylov Subspace Solvers on Graphics Processing Units Anzt, Hartwig; Sawyer, William; Tomov, Stanimire 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/ipdpsw.2014.107	conference	May 2014
Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs Aliaga, Jose I.; Perez, Joaquin; Quintana-Orti, Enrique S. 2013 42nd International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2013.41	conference	October 2013
Methods of conjugate gradients for solving linear systems Hestenes, M. R.; Stiefel, E. Journal of Research of the National Bureau of Standards, Vol. 49, Issue 6 https://doi.org/10.6028/jres.049.044	journal	December 1952
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers Anzt, Hartwig; Tomov, Stanimire; Dongarra, Jack Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15 https://doi.org/10.1145/2712386.2712387	conference	January 2015
Finite elements Hinch, E. J. Think Before You Compute https://doi.org/10.1017/9781108855297.008	book	April 2020
Finite Elements Numerical Approximation Methods for Elliptic Boundary Value Problems https://doi.org/10.1007/978-0-387-68805-3_9	book	January 2008
Finite Elements Gekeler, Eckart W. Mathematical Methods for Mechanics https://doi.org/10.1007/978-3-540-69279-9_9	book	January 2008
Sparse Matrix-Vector Multiplication on Multicore and Accelerators Williams, Samuel; Bell, Nathan; Choi, Jee Whan Scientific Computing with Multicore and Accelerators https://doi.org/10.1201/b10376-15	book	December 2010
Finite Elements Pian, Theodore H. H. Variational Methods in the Mechanics of Solids https://doi.org/10.1016/b978-0-08-024728-1.50031-8	book	January 1980
Accelerating Scientific Computations with Mixed Precision Algorithms Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack arXiv https://doi.org/10.48550/arxiv.0808.2794	text	January 2008

Cited By (1)

A review of CUDA optimization techniques and tools for structured grid computing Al-Mouhamed, Mayez A.; Khan, Ayaz H.; Mohammad, Nazeeruddin Computing, Vol. 102, Issue 4 https://doi.org/10.1007/s00607-019-00744-1	journal	July 2019

Similar Records

Batched matrix computations on hardware accelerators based on GPUs

Journal Article · Mon Feb 09 00:00:00 EST 2015 · International Journal of High Performance Computing Applications · OSTI ID:1361293

Haidar, Azzam; Dong, Tingxing; Luszczek, Piotr; +2 more

Performance Portable Batched Sparse Linear Solvers

Journal Article · Mon May 01 00:00:00 EDT 2023 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1361293

Liegeois, Kim; Rajamanickam, Sivasankaran; Berger-Vergiat, Luc

A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

Journal Article · Tue Mar 15 00:00:00 EDT 2016 · Journal of Computational Physics · OSTI ID:1361293

Alonso, Pedro; Badía, José M.; Chacón, Pablo; +3 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Krylov Subspace Methods
Iterative Solvers
Sparse Linear Systems
Graphics Processing Units
BiCGSTAB

Title: Acceleration of GPU-based Krylov solvers via data transfer reduction

Citation Formats

References (24)

Cited By (1)

Similar Records

Related Subjects