Performance Portable Batched Sparse Linear Solvers

Liegeois, Kim; Rajamanickam, Sivasankaran; Berger-Vergiat, Luc

doi:10.1109/TPDS.2023.3249110

Title: Performance Portable Batched Sparse Linear Solvers

Journal Article · Mon May 01 00:00:00 EDT 2023 · IEEE Transactions on Parallel and Distributed Systems

DOI:https://doi.org/10.1109/TPDS.2023.3249110· OSTI ID:1962268

^[1];

^[1]

Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA

Solving large number of small linear systems is increasingly becoming a bottleneck in computational science applications. While dense linear solvers for such systems have been studied before, batched sparse linear solvers are just starting to emerge. In this paper, we discuss algorithms for solving batched sparse linear systems and their implementation in the Kokkos Kernels library. The new algorithms are performance portable and map well to the hierarchical parallelism available in modern accelerator architectures. The sparse matrix vector product (SPMV) kernel is the main performance bottleneck of the Krylov solvers we implement in this work. The implementation of the batched SPMV and its performance are therefore discussed thoroughly in this paper. The implemented kernels are tested on different Central Processing Unit (CPU) and Graphic Processing Unit (GPU) architectures. We also develop batched Conjugate Gradient (CG) and batched Generalized Minimum Residual (GMRES) solvers using the batched SPMV. Our proposed solver was able to solve 20,000 sparse linear systems on V100 GPUs with a mean speedup of 76x and 924x compared to using a parallel sparse solver with a block diagonal system with all the small linear systems, and compared to solving the small systems one at a time, respectively. We see mean speedup of 0.51 compared to dense batched solver of cuSOLVER on V100, while using lot less memory. Thorough performance evaluation on three different architectures and analysis of the performance are presented.

View Journal Article

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC)

Grant/Contract Number:: NA0003525

OSTI ID:: 1962268

Alternate ID(s):: OSTI ID: 1962271; OSTI ID: 2311595

Report Number(s):: SAND-2023-06502J; 10054414

Journal Information:: IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Vol. 34 Journal Issue: 5; ISSN 1045-9219

Publisher:: Institute of Electrical and Electronics EngineersCopyright Statement

Country of Publication:: United States

Language:: English

References (12)

Batched Sparse Iterative Solvers for Computational Chemistry Simulations on GPUs Aggarwal, Isha; Kashi, Aditya; Nayak, Pratik 2021 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) https://doi.org/10.1109/ScalA54577.2021.00010	conference	November 2021
Evaluating the Intel Skylake Xeon Processor for HPC Workloads Hammond, Simon; Vaughan, Courtenay; Hughes, Clay 2018 International Conference on High Performance Computing & Simulation (HPCS) https://doi.org/10.1109/HPCS.2018.00064	conference	July 2018
Towards Performance Portability in a Compressible CFD Code Howard, Micah; Bradley, Andrew; Bova, Steven W. 23rd AIAA Computational Fluid Dynamics Conference https://doi.org/10.2514/6.2017-4407	conference	June 2017
Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations Kashi, Aditya; Nayak, Pratik; Kulkarni, Dhruva 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS53621.2022.00024	conference	May 2022
Landau collision operator in the CUDA programming model applied to thermal quench plasmas Adams, Mark F.; Brennan, Dylan P.; Knepley, Matthew G. 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS53621.2022.00020	conference	May 2022
Kokkos 3: Programming Model Extensions for the Exascale Era Trott, Christian; Lebrun-Grandie, Damien; Arndt, Daniel IEEE Transactions on Parallel and Distributed Systems https://doi.org/10.1109/TPDS.2021.3097283	journal	January 2021
Embedded Ensemble Propagation for Improving Performance, Portability, and Scalability of Uncertainty Quantification on Emerging Computational Architectures Phipps, E.; D'Elia, M.; Edwards, H. C. SIAM Journal on Scientific Computing, Vol. 39, Issue 2 https://doi.org/10.1137/15M1044679	journal	January 2017
GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models Liegeois, Kim; Boman, Romain; Phipps, Eric T. Computer Methods in Applied Mechanics and Engineering, Vol. 369 https://doi.org/10.1016/j.cma.2020.113188	journal	September 2020
Numerical simulation of laminar reacting flows with complex chemistry Day, M. S.; Bell, J. B. Combustion Theory and Modelling, Vol. 4, Issue 4 https://doi.org/10.1088/1364-7830/4/4/309	journal	December 2000
A multilevel finite element method (FE2) to describe the response of highly non-linear structures using generalized continua Feyel, Frédéric Computer Methods in Applied Mechanics and Engineering, Vol. 192, Issue 28-30 https://doi.org/10.1016/S0045-7825(03)00348-7	journal	July 2003
A conservative, thermodynamically consistent numerical approach for low Mach number combustion. Part I: Single-level integration Nonaka, Andrew; Day, Marcus S.; Bell, John B. Combustion Theory and Modelling, Vol. 22, Issue 1 https://doi.org/10.1080/13647830.2017.1390610	journal	August 2017
Designing vector-friendly compact BLAS and LAPACK kernels Kim, Kyungjoo; Costa, Timothy B.; Deveci, Mehmet Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126941	conference	January 2017

Similar Records

Deploy threading in Nalu solver stack

Technical Report · Mon Oct 01 00:00:00 EDT 2018 · OSTI ID:1962268

Prokopenko, Andrey; Thomas, Stephen; Swirydowicz, Kasia; +4 more

Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.

Technical Report · Fri Sep 29 00:00:00 EDT 2017 · OSTI ID:1962268

Domino, Stefan P.; Ananthan, Shreyas; Knaus, Robert C.; +1 more

High performance sparse multifrontal solvers on modern GPUs

Journal Article · Sat Feb 05 00:00:00 EST 2022 · Parallel Computing · OSTI ID:1962268

Ghysels, Pieter; Synk, Ryan

Related Subjects

97 MATHEMATICS AND COMPUTING
Batch sparse solvers
batch BLAS
kokkos kernels
performance portable

Title: Performance Portable Batched Sparse Linear Solvers

Citation Formats

References (12)

Similar Records

Related Subjects