Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Hussain, Md Taufique; Selvitopi, Oguz; Buluc, Aydin; Azad, Ariful

doi:10.1109/ipdps49936.2021.00018

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Journal Article · Mon May 17 00:00:00 EDT 2021 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)

DOI:https://doi.org/10.1109/ipdps49936.2021.00018· OSTI ID:1817306

Hussain, Md Taufique ^[1]; Selvitopi, Oguz ^[2]; Buluc, Aydin ^[2]; Azad, Ariful ^[1]

Indiana Univ., Bloomington, IN (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating trillions of nonzeros in the output matrix. Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output. Furthermore, we address these challenges with an integrated communication-avoiding and memory-constrained SpGEMM algorithm that scales to 262,144 cores (more than 1 million hardware threads) and can multiply sparse matrices of any size as long as inputs and a fraction of output fit in the aggregated memory. As we go from 16,384 cores to 262,144 cores on a Cray XC40 supercomputer, the new SpGEMM algorithm runs 10x faster when multiplying large-scale protein-similarity matrices.

View Accepted Manuscript (DOE)

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC02-05CH11231

OSTI ID:: 1817306

Journal Information:: Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS), Journal Name: Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) Vol. 2021; ISSN 1530-2075

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

References (25)

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors Nagasaka, Yusuke; Matsuoka, Satoshi; Azad, Ariful Parallel Computing, Vol. 90 https://doi.org/10.1016/j.parco.2019.102545	journal	December 2019
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A. Nucleic Acids Research, Vol. 46, Issue 6 https://doi.org/10.1093/nar/gkx1313	journal	January 2018
The parallelism motifs of genomic data analysis Yelick, Katherine; Buluç, Aydın; Awan, Muaaz Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394	journal	January 2020
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication Catalyurek, U. V.; Aykanat, C. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, Issue 7 https://doi.org/10.1109/71.780863	journal	July 1999
High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU Nagasaka, Yusuke; Nukada, Akira; Matsuoka, Satoshi 2017 46th International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/icpp.2017.19	conference	August 2017
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments Buluc, Aydin; Gilbert, John arXiv https://doi.org/10.48550/arxiv.1109.3739	text	January 2011
Sparse matrix multiplication: The distributed block-compressed sparse row library Borštnik, Urban; VandeVondele, Joost; Weber, Valéry Parallel Computing, Vol. 40, Issue 5-6 https://doi.org/10.1016/j.parco.2014.03.012	journal	May 2014
Multilevel hypergraph partitioning: applications in VLSI domain Karypis, G.; Aggarwal, R.; Kumar, V. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 7, Issue 1 https://doi.org/10.1109/92.748202	journal	March 1999
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training Qin, Eric; Samajdar, Ananda; Kwon, Hyoukjun 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) https://doi.org/10.1109/HPCA47549.2020.00015	conference	February 2020
Parallel hypergraph partitioning for scientific computing Devine, K. D.; Boman, E. G.; Heaphy, R. T. Proceedings 20th IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2006.1639359	conference	January 2006
Parallel Triangle Counting and Enumeration Using Matrix Algebra Azad, Ariful; Buluc, Aydin; Gilbert, John 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW) https://doi.org/10.1109/IPDPSW.2015.75	conference	May 2015
Performance-portable sparse matrix-matrix multiplication for many-core architectures Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.8	conference	May 2017
Sparse Matrices in MATLAB: Design and Implementation Gilbert, John R.; Moler, Cleve; Schreiber, Robert SIAM Journal on Matrix Analysis and Applications, Vol. 13, Issue 1 https://doi.org/10.1137/0613024	journal	January 1992
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments Buluç, Aydin; Gilbert, John R. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110848244	journal	January 2012
Sparse Matrix-Matrix Products Executed Through Coloring McCourt, Michael; Smith, Barry; Zhang, Hong SIAM Journal on Matrix Analysis and Applications, Vol. 36, Issue 1 https://doi.org/10.1137/13093426X	journal	January 2015
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication Azad, Ariful; Ballard, Grey; Buluç, Aydin SIAM Journal on Scientific Computing, Vol. 38, Issue 6 https://doi.org/10.1137/15M104253X	journal	January 2016
Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures Gremse, Felix; Küpper, Kerstin; Naumann, Uwe SIAM Journal on Scientific Computing, Vol. 40, Issue 4 https://doi.org/10.1137/17M1121378	journal	January 2018
Parallel SimRank computation on large graphs with iterative aggregation He, Guoming; Feng, Haijun; Li, Cuiping Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10 https://doi.org/10.1145/1835804.1835874	conference	January 2010
The university of Florida sparse matrix collection Davis, Timothy A.; Hu, Yifan ACM Transactions on Mathematical Software, Vol. 38, Issue 1 https://doi.org/10.1145/2049662.2049663	journal	November 2011
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication Ballard, Grey; Druinsky, Alex; Knight, Nicholas ACM Transactions on Parallel Computing, Vol. 3, Issue 3 https://doi.org/10.1145/3015144	journal	December 2016
Scaling betweenness centrality using communication-efficient sparse matrix multiplication Solomonik, Edgar; Besta, Maciej; Vella, Flavio SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126971	conference	November 2017
Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication Akbudak, Kadir; Selvitopi, Oguz; Aykanat, Cevdet ACM Transactions on Parallel Computing, Vol. 4, Issue 3 https://doi.org/10.1145/3155292	journal	April 2018
Matrix Algebra Framework for Portable, Scalable and Efficient Query Engines for RDF Graphs Jamour, Fuad; Abdelaziz, Ibrahim; Chen, Yuanzhao EuroSys '19: Fourteenth EuroSys Conference 2019, Proceedings of the Fourteenth EuroSys Conference 2019 https://doi.org/10.1145/3302424.3303962	conference	March 2019
Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking Gu, Zhixiang; Moreira, Jose; Edelsohn, David SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures https://doi.org/10.1145/3350755.3400216	conference	July 2020
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition Gustavson, Fred G. ACM Transactions on Mathematical Software, Vol. 4, Issue 3 https://doi.org/10.1145/355791.355796	journal	September 1978

Cited By (1)

The parallelism motifs of genomic data analysis Yelick, Katherine; Buluç, Aydın; Awan, Muaaz Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394	journal	January 2020

Similar Records

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Journal Article · Fri Aug 30 00:00:00 EDT 2019 · Parallel Computing · OSTI ID:1559813

High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures

Conference · Mon Aug 13 00:00:00 EDT 2018 · OSTI ID:1454499

A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm

Journal Article · Mon Jul 03 00:00:00 EDT 2017 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) · OSTI ID:1525227

Related Subjects

97 MATHEMATICS AND COMPUTING
genomics
graph theory
mathematics computing
matrix algebra
matrix multiplication
memory management
multiprocessing systems
parallel machines
parallel processing
proteins
resource allocations
scientific computing
social networking
sparse matrices
three-dimensional displays

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Citation Formats

References (25)

Cited By (1)

Similar Records

Related Subjects