Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Fast truncated SVD of sparse and dense matrices on graphics processors

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [1];  [2]
  1. Karlsruhe Institute of Technology and Innovative Computing Laboratory, University of Tennessee at Knoxville, Knoxville, TN, USA
  2. Universitat Politècnica de València, València, Spain

We investigate the solution of low-rank matrix approximation problems using the truncated singular value decomposition (SVD). For this purpose, we develop and optimize graphics processing unit (GPU) implementations for the randomized SVD and a blocked variant of the Lanczos approach. Our work takes advantage of the fact that the two methods are composed of very similar linear algebra building blocks, which can be assembled using numerical kernels from existing high-performance linear algebra libraries. Furthermore, the experiments with several sparse matrices arising in representative real-world applications and synthetic dense test matrices reveal a performance advantage of the block Lanczos algorithm when targeting the same approximation accuracy.

Research Organization:
US Department of Energy (USDOE), Washington, DC (United States). Office of Science, Exascale Computing Project
Sponsoring Organization:
USDOE
OSTI ID:
2424934
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 3-4 Vol. 37; ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English

References (14)

Numerical Methods for Least Squares Problems book January 1996
Restarted block Lanczos bidiagonalization methods journal January 2007
A randomized algorithm for the decomposition of matrices journal January 2011
Communication-Avoiding QR Decomposition for GPUs
  • Anderson, Michael; Ballard, Grey; Demmel, James
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.15
conference May 2011
Rounding error analysis of the classical Gram-Schmidt orthogonalization process journal May 2005
An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis conference November 2022
CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System conference November 2014
On the Occurrence of Superlinear Convergence of Exact and Inexact Krylov Subspace Methods journal January 2005
The university of Florida sparse matrix collection journal November 2011
A Block Lanczos Method for Computing the Singular Values and Corresponding Singular Vectors of a Matrix journal June 1981
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions journal January 2011
Parallel out-of-core computation and updating of the QR factorization journal March 2005
A Block Bidiagonalization Method for Fixed-Accuracy Low-Rank Matrix Approximation journal April 2022
Communication-optimal Parallel and Sequential QR and LU Factorizations journal January 2012

Similar Records

Fast truncated SVD of sparse and dense matrices on graphics processors
Journal Article · Wed Jun 07 00:00:00 EDT 2023 · International Journal of High Performance Computing Applications · OSTI ID:1984302

Sparse matrix‐vector and matrix‐multivector products for the truncated SVD on graphics processors
Journal Article · Fri Aug 04 00:00:00 EDT 2023 · Concurrency and Computation. Practice and Experience · OSTI ID:1993862

Related Subjects