skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Design Principles for Sparse Matrix Multiplication on the GPU

Abstract

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1]
  1. Univ. of California, Davis, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1457016
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: 24. International European Conference on Parallel and Distributed Computing - EURO-PAR 2018, Turin (Italy), 29-31 Aug 2018
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Yang, Carl, Buluc, Aydm, and Owens, John D. Design Principles for Sparse Matrix Multiplication on the GPU. United States: N. p., 2018. Web. doi:10.1007/978-3-319-96983-1_48.
Yang, Carl, Buluc, Aydm, & Owens, John D. Design Principles for Sparse Matrix Multiplication on the GPU. United States. doi:10.1007/978-3-319-96983-1_48.
Yang, Carl, Buluc, Aydm, and Owens, John D. Mon . "Design Principles for Sparse Matrix Multiplication on the GPU". United States. doi:10.1007/978-3-319-96983-1_48. https://www.osti.gov/servlets/purl/1457016.
@article{osti_1457016,
title = {Design Principles for Sparse Matrix Multiplication on the GPU},
author = {Yang, Carl and Buluc, Aydm and Owens, John D.},
abstractNote = {We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.},
doi = {10.1007/978-3-319-96983-1_48},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Aug 27 00:00:00 EDT 2018},
month = {Mon Aug 27 00:00:00 EDT 2018}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: