Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Journal Article · · SIAM Journal on Scientific Computing
DOI:https://doi.org/10.1137/15M104253X· OSTI ID:1378775
 [1];  [2];  [1];  [3];  [4];  [5];  [6];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States); Wake Forest Univ., Winston Salem, NC (United States)
  3. Univ. of California, Berkeley, CA (United States)
  4. INRIA Paris-Rocquencourt, Alpines (France)
  5. The Hebrew Univ. (Israel)
  6. Tel Aviv Univ. (Israel)

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1378775
Alternate ID(s):
OSTI ID: 1512883
OSTI ID: 1439191
Journal Information:
SIAM Journal on Scientific Computing, Journal Name: SIAM Journal on Scientific Computing Journal Issue: 6 Vol. 38; ISSN 1064-8275
Publisher:
SIAMCopyright Statement
Country of Publication:
United States
Language:
English

References (23)

Collective communication: theory, practice, and experience journal January 2007
Analyzing Scalability of Parallel Algorithms and Architectures journal September 1994
Communication lower bounds for distributed-memory matrix multiplication journal September 2004
Parallel processing of filtered queries in attributed semantic graphs journal May 2015
A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors journal November 2015
Sparse matrix multiplication: The distributed block-compressed sparse row library journal May 2014
Density Functional and Density Matrix Method Scaling Linearly with the Number of Atoms journal April 1996
A Unified Framework for Numerical and Combinatorial Computing journal March 2008
Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture journal November 2013
Parallel Matrix and Graph Algorithms journal November 1981
A Simple Parallel Algorithm for the Maximal Independent Set Problem journal November 1986
Sparse Matrices in MATLAB: Design and Implementation journal January 1992
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods journal January 2012
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments journal January 2012
An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay journal January 2013
Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication journal January 2014
Sparse Matrix-Matrix Products Executed Through Coloring journal January 2015
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging journal January 2015
Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos journal December 2014
An overview of the Trilinos project journal September 2005
Optimizing Sparse Matrix—Matrix Multiplication for the GPU journal October 2015
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition journal September 1978
The Combinatorial BLAS: design, implementation, and applications journal May 2011

Cited By (5)

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication journal January 2019
Numerical algorithms for high-performance computational science
  • Dongarra, Jack; Grigori, Laura; Higham, Nicholas J.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0066
journal January 2020
The parallelism motifs of genomic data analysis
  • Yelick, Katherine; Buluç, Aydın; Awan, Muaaz
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394
journal January 2020
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper posted_content March 2020
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper book January 2021