Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Design Principles for Sparse Matrix Multiplication on the GPU

Conference ·
 [1];  [2];  [1]
  1. Univ. of California, Davis, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1457016
Country of Publication:
United States
Language:
English

References (11)

The I/O Complexity of Sparse Matrix Dense Matrix Multiplication book January 2010
Sparse Matrix-Vector Multiplication on GPGPUs journal January 2017
Templates for the Solution of Algebraic Eigenvalue Problems book January 2000
Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method journal January 2001
A high-performance parallel algorithm for nonnegative matrix factorization journal February 2016
A matrix approach to tomographic reconstruction and its implementation on GPUs journal April 2010
All-Pairs Shortest Paths Computation in the BSP Model book January 2001
Optimizing Sparse Matrix—Matrix Multiplication for the GPU journal October 2015
Regularizing graph centrality computations journal February 2015
An Iterative Method for Nonsymmetric Systems with Multiple Right-Hand Sides journal July 1995
Efficient sparse-matrix multi-vector product on GPUs
  • Hong, Changwan; Sadayappan, P.; Sukumaran-Rajam, Aravind
  • Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18 https://doi.org/10.1145/3208040.3208062
conference January 2018

Similar Records

Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
Conference · Thu Aug 14 00:00:00 EDT 2014 · OSTI ID:1407214

On the performance and energy efficiency of sparse linear algebra on GPUs
Journal Article · Tue Oct 04 20:00:00 EDT 2016 · International Journal of High Performance Computing Applications · OSTI ID:1437692

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Journal Article · Wed May 31 20:00:00 EDT 2017 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1379875

Related Subjects