Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors

Aliaga, José I.; Anzt, Hartwig; Quintana‐Ortí, Enrique S.; Tomás, Andrés E.

doi:10.1002/cpe.7871

Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors

Journal Article · Fri Aug 04 00:00:00 EDT 2023 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.7871· OSTI ID:1993862

^[1]; Anzt, Hartwig ^[2]; ^[3]; Tomás, Andrés E. ^[4]

Depto. de Ingeniería y Ciencia de Computadores Universitat Jaume I Castellón de la Plana Spain
Steinbuch Centre for Computing Karlsruhe Institute of Technology Karlsruhe Germany, Innovative Computing Lab University of Tennessee Knoxville (Tennessee) USA
Depto. de Informática de Sistemas y Computadores Universitat Politècnica de València Valencia Spain
Depto. de Ingeniería y Ciencia de Computadores Universitat Jaume I Castellón de la Plana Spain, Depto. de Informática Universitat de València Valencia Spain

Summary

Many practical algorithms for numerical rank computations implement an iterative procedure that involves repeated multiplications of a vector, or a collection of vectors, with both a sparse matrix and its transpose. Unfortunately, the realization of these sparse products on current high performance libraries often deliver much lower arithmetic throughput when the matrix involved in the product is transposed. In this work, we propose a hybrid sparse matrix layout, named CSRC, that combines the flexibility of some well‐known sparse formats to offer a number of appealing properties: (1) CSRC can be obtained at low cost from the popular CSR (compressed sparse row) format; (2) CSRC has similar storage requirements as CSR; and especially, (3) the implementation of the sparse product kernels delivers high performance for both the direct product and its transposed variant on modern graphics accelerators thanks to a significant reduction of atomic operations compared to a conventional implementation based on CSR. This solution thus renders considerably higher performance when integrated into an iterative algorithm for the truncated singular value decomposition (SVD), such as the randomized SVD or, as demonstrated in the experimental results, the block Golub–Kahan–Lanczos algorithm.

Sponsoring Organization:: USDOE

OSTI ID:: 1993862

Alternate ID(s):: OSTI ID: 1995857

Journal Information:: Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 28 Vol. 35; ISSN 1532-0626

Publisher:: Wiley Blackwell (John Wiley & Sons)Copyright Statement

Country of Publication:: United Kingdom

Language:: English

References (11)

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs Aliaga, José Ignacio; Anzt, Hartwig; Quintana-Ortí, Enrique S. Euro-Par 2020: Parallel Processing Workshops https://doi.org/10.1007/978-3-030-71593-9_7	book	January 2021
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication Buluç, Aydin; Williams, Samuel; Oliker, Leonid Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.73	conference	May 2011
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems Saad, Youcef; Schultz, Martin H. SIAM Journal on Scientific and Statistical Computing, Vol. 7, Issue 3 https://doi.org/10.1137/0907058	journal	July 1986
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions Halko, N.; Martinsson, P. G.; Tropp, J. A. SIAM Review, Vol. 53, Issue 2 https://doi.org/10.1137/090771806	journal	January 2011
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods Barrett, Richard; Berry, Michael; Chan, Tony F. https://doi.org/10.1137/1.9781611971538	book	January 1994
Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method Knyazev, Andrew V. SIAM Journal on Scientific Computing, Vol. 23, Issue 2 https://doi.org/10.1137/S1064827500366124	journal	January 2001
Model-driven autotuning of sparse matrix-vector multiply on GPUs Choi, Jee W.; Singh, Amik; Vuduc, Richard W. Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693471	conference	January 2010
The university of Florida sparse matrix collection Davis, Timothy A.; Hu, Yifan ACM Transactions on Mathematical Software, Vol. 38, Issue 1 https://doi.org/10.1145/2049662.2049663	journal	November 2011
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication Liu, Weifeng; Vinter, Brian Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15 https://doi.org/10.1145/2751205.2751209	conference	January 2015
Sparse Matrix-Vector Multiplication on GPGPUs Filippone, Salvatore; Cardellini, Valeria; Barbieri, Davide ACM Transactions on Mathematical Software, Vol. 43, Issue 4 https://doi.org/10.1145/3017994	journal	January 2017
A Block Lanczos Method for Computing the Singular Values and Corresponding Singular Vectors of a Matrix Golub, Gene H.; Luk, Franklin T.; Overton, Michael L. ACM Transactions on Mathematical Software, Vol. 7, Issue 2 https://doi.org/10.1145/355945.355946	journal	June 1981

Similar Records

Fast truncated SVD of sparse and dense matrices on graphics processors

Journal Article · Wed Jun 07 00:00:00 EDT 2023 · International Journal of High Performance Computing Applications · OSTI ID:2424934

Fast truncated SVD of sparse and dense matrices on graphics processors

Journal Article · Tue Jun 06 20:00:00 EDT 2023 · International Journal of High Performance Computing Applications · OSTI ID:1984302

Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors

Citation Formats

References (11)

Similar Records

Related Subjects