DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm

Abstract

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the emerging GraphBLAS standard and is the workhorse of many graph algorithms including breadth-first search, bipartite graph matching, and maximal independent set. As thread counts increase, existing multithreaded SpMSpV algorithms can spend more time accessing the sparse matrix data structure than doing arithmetic. Our shared-memory parallel SpMSpV algorithm is work efficient in the sense that its total work is proportional to the number of arithmetic operations required. The key insight is to avoid each thread individually scan the list of matrix columns. Our algorithm is simple to implement and operates on existing column-based sparse matrix formats. It performs well on diverse matrices and vectors with heterogeneous sparsity patterns. A high-performance implementation of the algorithm attains up to 15x speedup on a 24-core Intel Ivy Bridge processor and up to 49x speedup on a 64-core Intel KNL manycore processor. In contrast to implementations of existing algorithms, the performance of our algorithm is sustained on a variety of different input types include matrices representing scale-free and high-diameter graphs.

Authors:
 [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1525227
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Additional Journal Information:
Journal Name: Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS); Journal Volume: 2017; Conference: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL (United States), 29 May - 2 Jun 2017; Journal ID: ISSN 1530-2075
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Azad, Ariful, and Buluc, Aydin. A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm. United States: N. p., 2017. Web. doi:10.1109/IPDPS.2017.76.
Azad, Ariful, & Buluc, Aydin. A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm. United States. https://doi.org/10.1109/IPDPS.2017.76
Azad, Ariful, and Buluc, Aydin. Mon . "A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm". United States. https://doi.org/10.1109/IPDPS.2017.76. https://www.osti.gov/servlets/purl/1525227.
@article{osti_1525227,
title = {A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm},
author = {Azad, Ariful and Buluc, Aydin},
abstractNote = {We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the emerging GraphBLAS standard and is the workhorse of many graph algorithms including breadth-first search, bipartite graph matching, and maximal independent set. As thread counts increase, existing multithreaded SpMSpV algorithms can spend more time accessing the sparse matrix data structure than doing arithmetic. Our shared-memory parallel SpMSpV algorithm is work efficient in the sense that its total work is proportional to the number of arithmetic operations required. The key insight is to avoid each thread individually scan the list of matrix columns. Our algorithm is simple to implement and operates on existing column-based sparse matrix formats. It performs well on diverse matrices and vectors with heterogeneous sparsity patterns. A high-performance implementation of the algorithm attains up to 15x speedup on a 24-core Intel Ivy Bridge processor and up to 49x speedup on a 64-core Intel KNL manycore processor. In contrast to implementations of existing algorithms, the performance of our algorithm is sustained on a variety of different input types include matrices representing scale-free and high-diameter graphs.},
doi = {10.1109/IPDPS.2017.76},
journal = {Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
number = ,
volume = 2017,
place = {United States},
year = {Mon Jul 03 00:00:00 EDT 2017},
month = {Mon Jul 03 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 21 works
Citation information provided by
Web of Science

Figures / Tables:

TABLE I TABLE I: Classification of parallel SpMSpV algorithms. t denotes the number of threads. SpMSpV-bucket is presented in this paper.

Save / Share:

Works referenced in this record:

Evaluation Criteria for Sparse Matrix Storage Formats
journal, February 2016

  • Langr, Daniel; Tvrdik, Pavel
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 2
  • DOI: 10.1109/TPDS.2015.2401575

GraphMat: high performance graph analytics made productive
journal, July 2015

  • Sundaram, Narayanan; Satish, Nadathur; Patwary, Md Mostofa Ali
  • Proceedings of the VLDB Endowment, Vol. 8, Issue 11
  • DOI: 10.14778/2809974.2809983

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
conference, May 2015

  • Yang, Carl; Wang, Yangzihao; Owens, John D.
  • 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)
  • DOI: 10.1109/IPDPSW.2015.77

Sparse Matrices in MATLAB: Design and Implementation
journal, January 1992

  • Gilbert, John R.; Moler, Cleve; Schreiber, Robert
  • SIAM Journal on Matrix Analysis and Applications, Vol. 13, Issue 1
  • DOI: 10.1137/0613024

Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics
conference, June 2016

  • Buono, Daniele; Petrini, Fabrizio; Checconi, Fabio
  • ICS '16: 2016 International Conference on Supercomputing, Proceedings of the 2016 International Conference on Supercomputing
  • DOI: 10.1145/2925426.2926278

Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs
conference, May 2016

  • Azad, Ariful; Buluc, Aydin
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.103

Graph programming interface (GPI): a linear algebra programming model for large scale graph computations
conference, May 2016

  • Ekanadham, K.; Horn, W. P.; Kumar, Manoj
  • CF'16: Computing Frontiers Conference, Proceedings of the ACM International Conference on Computing Frontiers
  • DOI: 10.1145/2903150.2903164

A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning
journal, January 2013

  • Spielman, Daniel A.; Teng, Shang-Hua
  • SIAM Journal on Computing, Vol. 42, Issue 1
  • DOI: 10.1137/080744888

Mathematical foundations of the GraphBLAS
conference, September 2016

  • Kepner, Jeremy; Aaltonen, Peter; Bader, David
  • 2016 IEEE High-Performance Extreme Computing Conference (HPEC), 2016 IEEE High Performance Extreme Computing Conference (HPEC)
  • DOI: 10.1109/HPEC.2016.7761646

Parallel graph analytics
journal, April 2016

  • Lenharth, Andrew; Nguyen, Donald; Pingali, Keshav
  • Communications of the ACM, Vol. 59, Issue 5
  • DOI: 10.1145/2901919

Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
journal, September 1978

  • Gustavson, Fred G.
  • ACM Transactions on Mathematical Software, Vol. 4, Issue 3
  • DOI: 10.1145/355791.355796

Local Graph Partitioning using PageRank Vectors
conference, October 2006

  • Andersen, Reid; Chung, Fan; Lang, Kevin
  • 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • DOI: 10.1109/FOCS.2006.44

The university of Florida sparse matrix collection
journal, November 2011

  • Davis, Timothy A.; Hu, Yifan
  • ACM Transactions on Mathematical Software, Vol. 38, Issue 1
  • DOI: 10.1145/2049662.2049663