Performance Portable Batched Sparse Linear Solvers
- Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA
Solving large number of small linear systems is increasingly becoming a bottleneck in computational science applications. While dense linear solvers for such systems have been studied before, batched sparse linear solvers are just starting to emerge. In this paper, we discuss algorithms for solving batched sparse linear systems and their implementation in the Kokkos Kernels library. The new algorithms are performance portable and map well to the hierarchical parallelism available in modern accelerator architectures. The sparse matrix vector product (SPMV) kernel is the main performance bottleneck of the Krylov solvers we implement in this work. The implementation of the batched SPMV and its performance are therefore discussed thoroughly in this paper. The implemented kernels are tested on different Central Processing Unit (CPU) and Graphic Processing Unit (GPU) architectures. We also develop batched Conjugate Gradient (CG) and batched Generalized Minimum Residual (GMRES) solvers using the batched SPMV. Our proposed solver was able to solve 20,000 sparse linear systems on V100 GPUs with a mean speedup of 76x and 924x compared to using a parallel sparse solver with a block diagonal system with all the small linear systems, and compared to solving the small systems one at a time, respectively. We see mean speedup of 0.51 compared to dense batched solver of cuSOLVER on V100, while using lot less memory. Thorough performance evaluation on three different architectures and analysis of the performance are presented.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)
- Grant/Contract Number:
- NA0003525
- OSTI ID:
- 1962268
- Alternate ID(s):
- OSTI ID: 1962271; OSTI ID: 2311595
- Report Number(s):
- SAND-2023-06502J; 10054414
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Vol. 34 Journal Issue: 5; ISSN 1045-9219
- Publisher:
- Institute of Electrical and Electronics EngineersCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.
High performance sparse multifrontal solvers on modern GPUs