Acceleration of GPU-based Krylov solvers via data transfer reduction
- Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.
- Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland)
- Univ. of Tennessee, Knoxville, TN (United States). Innovative Computing Lab.; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)
Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphics processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.
- Research Organization:
- Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF); Russian Scientific Fund (Russian Federation)
- Contributing Organization:
- Swiss National Supercomputing Centre (CSCS), Lugano (Switzerland); Univ. of Manchester (United Kingdom)
- Grant/Contract Number:
- SC0010042; ACI-1339822; N14-11-00190
- OSTI ID:
- 1361293
- Journal Information:
- International Journal of High Performance Computing Applications, Vol. 29, Issue 3; ISSN 1094-3420
- Publisher:
- SAGECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
A review of CUDA optimization techniques and tools for structured grid computing
|
journal | July 2019 |
Similar Records
Performance Portable Batched Sparse Linear Solvers
A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors