Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Nagasaka, Yusuke; Matsuoka, Satoshi; Azad, Ariful; Buluç, Aydın

doi:10.1016/j.parco.2019.102545

Title: Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Journal Article · Fri Aug 30 00:00:00 EDT 2019 · Parallel Computing

DOI:https://doi.org/10.1016/j.parco.2019.102545· OSTI ID:1559813

Nagasaka, Yusuke ^[1]; Matsuoka, Satoshi ^[2]; Azad, Ariful ^[3]; Buluç, Aydın ^[4]

Tokyo Inst. of Technology (Japan)
RIKEN Center for Computational Science, Kobe (Japan)
Indiana Univ., Bloomington, IN (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. In this work, we firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. We examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. We build the performance model for hash-table and heap-based algorithms, which supports the recipe. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix. Finally, we integrate our implementations into a large-scale protein clustering code named HipMCL, accelerating its SpGEMM kernel by up to 10X and achieving an overall performance boost for the whole HipMCL application by 2.6X.

View Accepted Manuscript (DOE)

View Accepted Manuscript (Publisher)

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); Japan Science and Technology Agency (JST); USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC02-05CH11231; JPMJCR1303; JPMJCR1687

OSTI ID:: 1559813

Alternate ID(s):: OSTI ID: 1692084

Journal Information:: Parallel Computing, Vol. 90, Issue C; ISSN 0167-8191

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 16 works

Citation information provided by
Web of Science

References (12)

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A. Nucleic Acids Research, Vol. 46, Issue 6 https://doi.org/10.1093/nar/gkx1313	journal	January 2018
Near linear time algorithm to detect community structures in large-scale networks Raghavan, Usha Nandini; Albert, Réka; Kumara, Soundar Physical Review E, Vol. 76, Issue 3 https://doi.org/10.1103/PhysRevE.76.036106	journal	September 2007
Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid Ballard, Grey; Siefert, Christopher; Hu, Jonathan SIAM Journal on Scientific Computing, Vol. 38, Issue 3 https://doi.org/10.1137/15M1028807	journal	January 2016
Solvers for $\mathcal{O} (N)$ Electronic Structure in the Strong Scaling Limit Bock, Nicolas; Challacombe, Matt; Kalé, Laxmikant V. SIAM Journal on Scientific Computing, Vol. 38, Issue 1 https://doi.org/10.1137/140974602	journal	January 2016
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition Gustavson, Fred G. ACM Transactions on Mathematical Software, Vol. 4, Issue 3 https://doi.org/10.1145/355791.355796	journal	September 1978
Sparse Matrices in MATLAB: Design and Implementation Gilbert, John R.; Moler, Cleve; Schreiber, Robert SIAM Journal on Matrix Analysis and Applications, Vol. 13, Issue 1 https://doi.org/10.1137/0613024	journal	January 1992
Optimizing Sparse Matrix—Matrix Multiplication for the GPU Dalton, Steven; Olson, Luke; Bell, Nathan ACM Transactions on Mathematical Software, Vol. 41, Issue 4 https://doi.org/10.1145/2699470	journal	October 2015
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging Gremse, Felix; Höfter, Andreas; Schwen, Lars Ole SIAM Journal on Scientific Computing, Vol. 37, Issue 1 https://doi.org/10.1137/130948811	journal	January 2015
ViennaCL---Linear Algebra Library for Multi- and Many-Core Architectures Rupp, Karl; Tillet, Philippe; Rudolf, Florian SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1026419	journal	January 2016
Benchmarking optimization software with performance profiles Dolan, Elizabeth D.; Moré, Jorge J. Mathematical Programming, Vol. 91, Issue 2 https://doi.org/10.1007/s101070100263	journal	January 2002
The Combinatorial BLAS: design, implementation, and applications Buluç, Aydın; Gilbert, John R. The International Journal of High Performance Computing Applications, Vol. 25, Issue 4 https://doi.org/10.1177/1094342011403516	journal	May 2011
Ternary Sparse Matrix Representation for Volumetric Mesh Subdivision and Processing on GPUs Mueller-Roemer, J. S.; Altenhofen, C.; Stork, A. Computer Graphics Forum, Vol. 36, Issue 5 https://doi.org/10.1111/cgf.13245	journal	August 2017

Cited By (4)

The parallelism motifs of genomic data analysis Yelick, Katherine; Buluç, Aydın; Awan, Muaaz Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394	journal	January 2020
Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking Gu, Zhixiang; Moreira, Jose; Edelsohn, David arXiv https://doi.org/10.48550/arxiv.2002.11302	preprint	January 2020
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices Selvitopi, Oguz; Ekanayake, Saliya; Guidi, Giulia arXiv https://doi.org/10.48550/arxiv.2009.14467	preprint	January 2020
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale Hussain, Md Taufique; Selvitopi, Oguz; Buluç, Aydin arXiv https://doi.org/10.48550/arxiv.2010.08526	preprint	January 2020

Similar Records

High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures

Conference · Mon Aug 13 00:00:00 EDT 2018 · OSTI ID:1559813

Nagasaka, Yusuke; Matsuoka, Satoshi; Azad, Ariful; +1 more

Storage-Intensive Supercomputing Benchmark Study

Technical Report · Tue Oct 30 00:00:00 EDT 2007 · OSTI ID:1559813

Cohen, J; Dossa, D; Gokhale, M; +4 more

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Journal Article · Mon May 17 00:00:00 EDT 2021 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) · OSTI ID:1559813

Hussain, Md Taufique; Selvitopi, Oguz; Buluc, Aydin; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Sparse matrix
SpGEMM
Intel KNL

Title: Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Citation Formats

References (12)

Cited By (4)

Similar Records

Related Subjects