skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Journal Article · · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)
 [1];  [2];  [2];  [1]
  1. Indiana Univ., Bloomington, IN (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating trillions of nonzeros in the output matrix. Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output. Furthermore, we address these challenges with an integrated communication-avoiding and memory-constrained SpGEMM algorithm that scales to 262,144 cores (more than 1 million hardware threads) and can multiply sparse matrices of any size as long as inputs and a fraction of output fit in the aggregated memory. As we go from 16,384 cores to 262,144 cores on a Cray XC40 supercomputer, the new SpGEMM algorithm runs 10x faster when multiplying large-scale protein-similarity matrices.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1817306
Journal Information:
Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vol. 2021; Conference: 2021 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), Portland, OR (United States), 17-21 May 2021; ISSN 1530-2075
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (25)

Sparse Matrix-Matrix Products Executed Through Coloring journal January 2015
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks journal January 2018
Parallel SimRank computation on large graphs with iterative aggregation conference January 2010
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training conference February 2020
Sparse Matrices in MATLAB: Design and Implementation journal January 1992
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication journal July 1999
Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking
  • Gu, Zhixiang; Moreira, Jose; Edelsohn, David
  • SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures https://doi.org/10.1145/3350755.3400216
conference July 2020
Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication journal April 2018
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication journal January 2016
Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors journal December 2019
Parallel Triangle Counting and Enumeration Using Matrix Algebra conference May 2015
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition journal September 1978
Sparse matrix multiplication: The distributed block-compressed sparse row library journal May 2014
The university of Florida sparse matrix collection journal November 2011
Matrix Algebra Framework for Portable, Scalable and Efficient Query Engines for RDF Graphs conference March 2019
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication journal December 2016
Multilevel hypergraph partitioning: applications in VLSI domain journal March 1999
Parallel hypergraph partitioning for scientific computing conference January 2006
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments journal January 2012
Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures journal January 2018
The parallelism motifs of genomic data analysis
  • Yelick, Katherine; Buluç, Aydın; Awan, Muaaz
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394
journal January 2020
Performance-portable sparse matrix-matrix multiplication for many-core architectures
  • Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran
  • 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.8
conference May 2017
Scaling betweenness centrality using communication-efficient sparse matrix multiplication
  • Solomonik, Edgar; Besta, Maciej; Vella, Flavio
  • SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126971
conference November 2017
High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU conference August 2017
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments text January 2011

Cited By (1)

The parallelism motifs of genomic data analysis
  • Yelick, Katherine; Buluç, Aydın; Awan, Muaaz
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394
journal January 2020