Design Principles for Sparse Matrix Multiplication on the GPU
- Univ. of California, Davis, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- DOE Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1457016
- Country of Publication:
- United States
- Language:
- English
Similar Records
Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
On the performance and energy efficiency of sparse linear algebra on GPUs
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Conference
·
Thu Aug 14 00:00:00 EDT 2014
·
OSTI ID:1407214
On the performance and energy efficiency of sparse linear algebra on GPUs
Journal Article
·
Tue Oct 04 20:00:00 EDT 2016
· International Journal of High Performance Computing Applications
·
OSTI ID:1437692
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Journal Article
·
Wed May 31 20:00:00 EDT 2017
· IEEE Transactions on Parallel and Distributed Systems
·
OSTI ID:1379875