skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Portable Batched Sparse Linear Solvers

Journal Article · · IEEE Transactions on Parallel and Distributed Systems

Solving large number of small linear systems is increasingly becoming a bottleneck in computational science applications. While dense linear solvers for such systems have been studied before, batched sparse linear solvers are just starting to emerge. In this paper, we discuss algorithms for solving batched sparse linear systems and their implementation in the Kokkos Kernels library. The new algorithms are performance portable and map well to the hierarchical parallelism available in modern accelerator architectures. The sparse matrix vector product (SPMV) kernel is the main performance bottleneck of the Krylov solvers we implement in this work. The implementation of the batched SPMV and its performance are therefore discussed thoroughly in this paper. The implemented kernels are tested on different Central Processing Unit (CPU) and Graphic Processing Unit (GPU) architectures. We also develop batched Conjugate Gradient (CG) and batched Generalized Minimum Residual (GMRES) solvers using the batched SPMV. Our proposed solver was able to solve 20,000 sparse linear systems on V100 GPUs with a mean speedup of 76x and 924x compared to using a parallel sparse solver with a block diagonal system with all the small linear systems, and compared to solving the small systems one at a time, respectively. We see mean speedup of 0.51 compared to dense batched solver of cuSOLVER on V100, while using lot less memory. Thorough performance evaluation on three different architectures and analysis of the performance are presented.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)
Grant/Contract Number:
NA0003525
OSTI ID:
1962268
Alternate ID(s):
OSTI ID: 1962271; OSTI ID: 2311595
Report Number(s):
SAND-2023-06502J; 10054414
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Vol. 34 Journal Issue: 5; ISSN 1045-9219
Publisher:
Institute of Electrical and Electronics EngineersCopyright Statement
Country of Publication:
United States
Language:
English

References (12)

Batched Sparse Iterative Solvers for Computational Chemistry Simulations on GPUs conference November 2021
Evaluating the Intel Skylake Xeon Processor for HPC Workloads conference July 2018
Towards Performance Portability in a Compressible CFD Code conference June 2017
Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations conference May 2022
Landau collision operator in the CUDA programming model applied to thermal quench plasmas conference May 2022
Kokkos 3: Programming Model Extensions for the Exascale Era journal January 2021
Embedded Ensemble Propagation for Improving Performance, Portability, and Scalability of Uncertainty Quantification on Emerging Computational Architectures journal January 2017
GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models journal September 2020
Numerical simulation of laminar reacting flows with complex chemistry journal December 2000
A multilevel finite element method (FE2) to describe the response of highly non-linear structures using generalized continua journal July 2003
A conservative, thermodynamically consistent numerical approach for low Mach number combustion. Part I: Single-level integration journal August 2017
Designing vector-friendly compact BLAS and LAPACK kernels
  • Kim, Kyungjoo; Costa, Timothy B.; Deveci, Mehmet
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126941
conference January 2017

Similar Records

Deploy threading in Nalu solver stack
Technical Report · Mon Oct 01 00:00:00 EDT 2018 · OSTI ID:1962268

Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.
Technical Report · Fri Sep 29 00:00:00 EDT 2017 · OSTI ID:1962268

High performance sparse multifrontal solvers on modern GPUs
Journal Article · Sat Feb 05 00:00:00 EST 2022 · Parallel Computing · OSTI ID:1962268