Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sandia National Lab. (SNL-CA), Livermore, CA (United States); Wake Forest Univ., Winston Salem, NC (United States)
- Univ. of California, Berkeley, CA (United States)
- INRIA Paris-Rocquencourt, Alpines (France)
- The Hebrew Univ. (Israel)
- Tel Aviv Univ. (Israel)
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1378775
- Alternate ID(s):
- OSTI ID: 1512883
OSTI ID: 1439191
- Journal Information:
- SIAM Journal on Scientific Computing, Journal Name: SIAM Journal on Scientific Computing Journal Issue: 6 Vol. 38; ISSN 1064-8275
- Publisher:
- SIAMCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication
|
journal | January 2019 |
Numerical algorithms for high-performance computational science
|
journal | January 2020 |
The parallelism motifs of genomic data analysis
|
journal | January 2020 |
| BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper | posted_content | March 2020 |
BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper
|
book | January 2021 |
Similar Records
Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors