Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Abstract
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.
- Authors:
-
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Wake Forest Univ., Winston Salem, NC (United States)
- Univ. of California, Berkeley, CA (United States)
- French Inst. for Research in Computer Science and Automation (INRIA), Paris (France)
- Hebrew Univ. of Jerusalem (Israel)
- Tel Aviv Univ., Ramat Aviv (Israel)
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1512883
- Alternate Identifier(s):
- OSTI ID: 1378775
- Report Number(s):
- SAND2015-8837J
Journal ID: ISSN 1064-8275; 664883
- Grant/Contract Number:
- AC04-94AL85000; AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- SIAM Journal on Scientific Computing
- Additional Journal Information:
- Journal Volume: 38; Journal Issue: 6; Journal ID: ISSN 1064-8275
- Publisher:
- SIAM
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; parallel computing; numerical linear algebra; sparse matrix-matrix multiplication; 2.5D algorithms; 3D algorithms; multithreading; SpGEMM; 2D decomposition; graph algorithms
Citation Formats
Azad, Ariful, Ballard, Grey, Buluç, Aydin, Demmel, James, Grigori, Laura, Schwartz, Oded, Toledo, Sivan, and Williams, Samuel. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication. United States: N. p., 2016.
Web. doi:10.1137/15M104253X.
Azad, Ariful, Ballard, Grey, Buluç, Aydin, Demmel, James, Grigori, Laura, Schwartz, Oded, Toledo, Sivan, & Williams, Samuel. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication. United States. https://doi.org/10.1137/15M104253X
Azad, Ariful, Ballard, Grey, Buluç, Aydin, Demmel, James, Grigori, Laura, Schwartz, Oded, Toledo, Sivan, and Williams, Samuel. Tue .
"Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication". United States. https://doi.org/10.1137/15M104253X. https://www.osti.gov/servlets/purl/1512883.
@article{osti_1512883,
title = {Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication},
author = {Azad, Ariful and Ballard, Grey and Buluç, Aydin and Demmel, James and Grigori, Laura and Schwartz, Oded and Toledo, Sivan and Williams, Samuel},
abstractNote = {Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.},
doi = {10.1137/15M104253X},
journal = {SIAM Journal on Scientific Computing},
number = 6,
volume = 38,
place = {United States},
year = {Tue Nov 08 00:00:00 EST 2016},
month = {Tue Nov 08 00:00:00 EST 2016}
}
Web of Science
Figures / Tables:
Works referenced in this record:
Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication
journal, January 2014
- Akbudak, Kadir; Aykanat, Cevdet
- SIAM Journal on Scientific Computing, Vol. 36, Issue 5
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods
journal, January 2012
- Bell, Nathan; Dalton, Steven; Olson, Luke N.
- SIAM Journal on Scientific Computing, Vol. 34, Issue 4
An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay
journal, January 2013
- Bock, Nicolas; Challacombe, Matt
- SIAM Journal on Scientific Computing, Vol. 35, Issue 1
Sparse matrix multiplication: The distributed block-compressed sparse row library
journal, May 2014
- Borštnik, Urban; VandeVondele, Joost; Weber, Valéry
- Parallel Computing, Vol. 40, Issue 5-6
The Combinatorial BLAS: design, implementation, and applications
journal, May 2011
- Buluç, Aydın; Gilbert, John R.
- The International Journal of High Performance Computing Applications, Vol. 25, Issue 4
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
journal, January 2012
- Buluç, Aydin; Gilbert, John R.
- SIAM Journal on Scientific Computing, Vol. 34, Issue 4
Collective communication: theory, practice, and experience
journal, January 2007
- Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi
- Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13
Optimizing Sparse Matrix—Matrix Multiplication for the GPU
journal, October 2015
- Dalton, Steven; Olson, Luke; Bell, Nathan
- ACM Transactions on Mathematical Software, Vol. 41, Issue 4
Parallel Matrix and Graph Algorithms
journal, November 1981
- Dekel, Eliezer; Nassimi, David; Sahni, Sartaj
- SIAM Journal on Computing, Vol. 10, Issue 4
Sparse Matrices in MATLAB: Design and Implementation
journal, January 1992
- Gilbert, John R.; Moler, Cleve; Schreiber, Robert
- SIAM Journal on Matrix Analysis and Applications, Vol. 13, Issue 1
A Unified Framework for Numerical and Combinatorial Computing
journal, March 2008
- Gilbert, John R.; Reinhardt, Steve; Shah, Viral B.
- Computing in Science & Engineering, Vol. 10, Issue 2
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging
journal, January 2015
- Gremse, Felix; Höfter, Andreas; Schwen, Lars Ole
- SIAM Journal on Scientific Computing, Vol. 37, Issue 1
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
journal, September 1978
- Gustavson, Fred G.
- ACM Transactions on Mathematical Software, Vol. 4, Issue 3
An overview of the Trilinos project
journal, September 2005
- Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.
- ACM Transactions on Mathematical Software, Vol. 31, Issue 3
Communication lower bounds for distributed-memory matrix multiplication
journal, September 2004
- Irony, Dror; Toledo, Sivan; Tiskin, Alexander
- Journal of Parallel and Distributed Computing, Vol. 64, Issue 9
Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture
journal, November 2013
- Kogge, Peter; Shalf, John
- Computing in Science & Engineering, Vol. 15, Issue 6
Density Functional and Density Matrix Method Scaling Linearly with the Number of Atoms
journal, April 1996
- Kohn, W.
- Physical Review Letters, Vol. 76, Issue 17
Analyzing Scalability of Parallel Algorithms and Architectures
journal, September 1994
- Kumar, V. P.; Gupta, A.
- Journal of Parallel and Distributed Computing, Vol. 22, Issue 3
Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos
journal, December 2014
- Lin, Paul; Bettencourt, Matthew; Domino, Stefan
- Parallel Processing Letters, Vol. 24, Issue 04
A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors
journal, November 2015
- Liu, Weifeng; Vinter, Brian
- Journal of Parallel and Distributed Computing, Vol. 85
A Simple Parallel Algorithm for the Maximal Independent Set Problem
journal, November 1986
- Luby, Michael
- SIAM Journal on Computing, Vol. 15, Issue 4
Parallel processing of filtered queries in attributed semantic graphs
journal, May 2015
- Lugowski, Adam; Kamil, Shoaib; Buluç, Aydın
- Journal of Parallel and Distributed Computing, Vol. 79-80
Sparse Matrix-Matrix Products Executed Through Coloring
journal, January 2015
- McCourt, Michael; Smith, Barry; Zhang, Hong
- SIAM Journal on Matrix Analysis and Applications, Vol. 36, Issue 1
Works referencing / citing this record:
Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication
journal, January 2019
- Liu, Junhong; He, Xin; Liu, Weifeng
- International Journal of Parallel Programming, Vol. 47, Issue 3
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper
posted_content, March 2020
- Guidi, Giulia; Ellis, Marquita; Rokhsar, Daniel
Numerical algorithms for high-performance computational science
journal, January 2020
- Dongarra, Jack; Grigori, Laura; Higham, Nicholas J.
- Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
The parallelism motifs of genomic data analysis
journal, January 2020
- Yelick, Katherine; Buluç, Aydın; Awan, Muaaz
- Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper
book, January 2021
- Guidi, Giulia; Ellis, Marquita; Rokhsar, Daniel
- SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21)
Figures / Tables found in this record: