Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Azad, Ariful; Ballard, Grey; Buluc, Aydin; Demmel, James; Grigori, Laura; Schwartz, Oded; Toledo, Sivan; Williams, Samuel

doi:10.1137/15M104253X

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Journal Article · Mon Nov 07 23:00:00 EST 2016 · SIAM Journal on Scientific Computing

DOI:https://doi.org/10.1137/15M104253X· OSTI ID:1378775

Azad, Ariful ^[1]; Ballard, Grey ^[2]; Buluc, Aydin ^[1]; Demmel, James ^[3]; Grigori, Laura ^[4]; Schwartz, Oded ^[5]; Toledo, Sivan ^[6]; Williams, Samuel ^[1]

Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sandia National Lab. (SNL-CA), Livermore, CA (United States); Wake Forest Univ., Winston Salem, NC (United States)
Univ. of California, Berkeley, CA (United States)
INRIA Paris-Rocquencourt, Alpines (France)
The Hebrew Univ. (Israel)
Tel Aviv Univ. (Israel)

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.

View Accepted Manuscript (DOE)

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

Grant/Contract Number:: AC02-05CH11231

OSTI ID:: 1378775

Alternate ID(s):: OSTI ID: 1512883
OSTI ID: 1439191

Journal Information:: SIAM Journal on Scientific Computing, Journal Name: SIAM Journal on Scientific Computing Journal Issue: 6 Vol. 38; ISSN 1064-8275

Publisher:: SIAMCopyright Statement

Country of Publication:: United States

Language:: English

References (23)

Collective communication: theory, practice, and experience Chan, Ernie; Heimlich, Marcel; Purkayastha, Avi Concurrency and Computation: Practice and Experience, Vol. 19, Issue 13 https://doi.org/10.1002/cpe.1206	journal	January 2007
Analyzing Scalability of Parallel Algorithms and Architectures Kumar, V. P.; Gupta, A. Journal of Parallel and Distributed Computing, Vol. 22, Issue 3 https://doi.org/10.1006/jpdc.1994.1099	journal	September 1994
Communication lower bounds for distributed-memory matrix multiplication Irony, Dror; Toledo, Sivan; Tiskin, Alexander Journal of Parallel and Distributed Computing, Vol. 64, Issue 9 https://doi.org/10.1016/j.jpdc.2004.03.021	journal	September 2004
Parallel processing of filtered queries in attributed semantic graphs Lugowski, Adam; Kamil, Shoaib; Buluç, Aydın Journal of Parallel and Distributed Computing, Vol. 79-80 https://doi.org/10.1016/j.jpdc.2014.08.010	journal	May 2015
A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors Liu, Weifeng; Vinter, Brian Journal of Parallel and Distributed Computing, Vol. 85 https://doi.org/10.1016/j.jpdc.2015.06.010	journal	November 2015
Sparse matrix multiplication: The distributed block-compressed sparse row library Borštnik, Urban; VandeVondele, Joost; Weber, Valéry Parallel Computing, Vol. 40, Issue 5-6 https://doi.org/10.1016/j.parco.2014.03.012	journal	May 2014
Density Functional and Density Matrix Method Scaling Linearly with the Number of Atoms Kohn, W. Physical Review Letters, Vol. 76, Issue 17 https://doi.org/10.1103/PhysRevLett.76.3168	journal	April 1996
A Unified Framework for Numerical and Combinatorial Computing Gilbert, John R.; Reinhardt, Steve; Shah, Viral B. Computing in Science & Engineering, Vol. 10, Issue 2 https://doi.org/10.1109/MCSE.2008.45	journal	March 2008
Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture Kogge, Peter; Shalf, John Computing in Science & Engineering, Vol. 15, Issue 6 https://doi.org/10.1109/MCSE.2013.95	journal	November 2013
Parallel Matrix and Graph Algorithms Dekel, Eliezer; Nassimi, David; Sahni, Sartaj SIAM Journal on Computing, Vol. 10, Issue 4 https://doi.org/10.1137/0210049	journal	November 1981
A Simple Parallel Algorithm for the Maximal Independent Set Problem Luby, Michael SIAM Journal on Computing, Vol. 15, Issue 4 https://doi.org/10.1137/0215074	journal	November 1986
Sparse Matrices in MATLAB: Design and Implementation Gilbert, John R.; Moler, Cleve; Schreiber, Robert SIAM Journal on Matrix Analysis and Applications, Vol. 13, Issue 1 https://doi.org/10.1137/0613024	journal	January 1992
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods Bell, Nathan; Dalton, Steven; Olson, Luke N. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110838844	journal	January 2012
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments Buluç, Aydin; Gilbert, John R. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110848244	journal	January 2012
An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay Bock, Nicolas; Challacombe, Matt SIAM Journal on Scientific Computing, Vol. 35, Issue 1 https://doi.org/10.1137/120870761	journal	January 2013
Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication Akbudak, Kadir; Aykanat, Cevdet SIAM Journal on Scientific Computing, Vol. 36, Issue 5 https://doi.org/10.1137/13092589X	journal	January 2014
Sparse Matrix-Matrix Products Executed Through Coloring McCourt, Michael; Smith, Barry; Zhang, Hong SIAM Journal on Matrix Analysis and Applications, Vol. 36, Issue 1 https://doi.org/10.1137/13093426X	journal	January 2015
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging Gremse, Felix; Höfter, Andreas; Schwen, Lars Ole SIAM Journal on Scientific Computing, Vol. 37, Issue 1 https://doi.org/10.1137/130948811	journal	January 2015
Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos Lin, Paul; Bettencourt, Matthew; Domino, Stefan Parallel Processing Letters, Vol. 24, Issue 04 https://doi.org/10.1142/S0129626414420055	journal	December 2014
An overview of the Trilinos project Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G. ACM Transactions on Mathematical Software, Vol. 31, Issue 3 https://doi.org/10.1145/1089014.1089021	journal	September 2005
Optimizing Sparse Matrix—Matrix Multiplication for the GPU Dalton, Steven; Olson, Luke; Bell, Nathan ACM Transactions on Mathematical Software, Vol. 41, Issue 4 https://doi.org/10.1145/2699470	journal	October 2015
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition Gustavson, Fred G. ACM Transactions on Mathematical Software, Vol. 4, Issue 3 https://doi.org/10.1145/355791.355796	journal	September 1978
The Combinatorial BLAS: design, implementation, and applications Buluç, Aydın; Gilbert, John R. The International Journal of High Performance Computing Applications, Vol. 25, Issue 4 https://doi.org/10.1177/1094342011403516	journal	May 2011

Cited By (5)

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication Liu, Junhong; He, Xin; Liu, Weifeng International Journal of Parallel Programming, Vol. 47, Issue 3 https://doi.org/10.1007/s10766-018-0604-8	journal	January 2019
Numerical algorithms for high-performance computational science Dongarra, Jack; Grigori, Laura; Higham, Nicholas J. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0066	journal	January 2020
The parallelism motifs of genomic data analysis Yelick, Katherine; Buluç, Aydın; Awan, Muaaz Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394	journal	January 2020
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper Guidi, Giulia; Ellis, Marquita; Rokhsar, Daniel https://doi.org/10.1101/464420	posted_content	March 2020
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper Guidi, Giulia; Ellis, Marquita; Rokhsar, Daniel SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21) https://doi.org/10.1137/1.9781611976830.12	book	January 2021

Similar Records

High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures

Conference · Mon Aug 13 00:00:00 EDT 2018 · OSTI ID:1454499

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Journal Article · Fri Aug 30 00:00:00 EDT 2019 · Parallel Computing · OSTI ID:1559813

Related Subjects

2.5D algorithms
2D decomposition
3D algorithms
97 MATHEMATICS AND COMPUTING
SpGEMM
graph algorithms
multithreading
numerical linear algebra
parallel computing
sparse matrix-matrix multiplication

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Citation Formats

References (23)

Cited By (5)

Similar Records

Related Subjects