Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

Akbudak, Kadir; Selvitopi, Oguz; Aykanat, Cevdet

doi:10.1145/3155292

Title: Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

Journal Article · Wed Jan 03 00:00:00 EST 2018 · ACM Transactions on Parallel Computing

DOI:https://doi.org/10.1145/3155292· OSTI ID:1525287

Akbudak, Kadir ^[1]; Selvitopi, Oguz ^[1]; Aykanat, Cevdet ^[1]

Bilkent Univ., Ankara (Turkey)

We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. Here, we also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product--parallel formulation scales better, the row-by-row-product--parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

Grant/Contract Number:: AC02-05CH11231

OSTI ID:: 1525287

Journal Information:: ACM Transactions on Parallel Computing, Vol. 4, Issue 3; ISSN 2329-4949

Publisher:: Association for Computing MachineryCopyright Statement

Country of Publication:: United States

Language:: English

References (39)

Amazon.com recommendations: item-to-item collaborative filtering Linden, G.; Smith, B.; York, J. IEEE Internet Computing, Vol. 7, Issue 1 https://doi.org/10.1109/MIC.2003.1167344	journal	January 2003
A Multigrid Tutorial, Second Edition Briggs, William L.; Henson, Van Emden; McCormick, Steve F. Other Titles in Applied Mathematics https://doi.org/10.1137/1.9780898719505	book	January 2000
Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication Ballard, Grey; Druinsky, Alex; Knight, Nicholas Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15 https://doi.org/10.1145/2755573.2755613	conference	January 2015
Reducing latency cost in 2D sparse matrix partitioning models Selvitopi, Oguz; Aykanat, Cevdet Parallel Computing, Vol. 57 https://doi.org/10.1016/j.parco.2016.04.004	journal	September 2016
SUMMA: scalable universal matrix multiplication algorithm Van De Geijn, R. A.; Watts, J. Concurrency: Practice and Experience, Vol. 9, Issue 4 https://doi.org/10.1002/(SICI)1096-9128(199704)9:4%3C255::AID-CPE250%3E3.0.CO;2-2	journal	April 1997
Order-N tight-binding molecular dynamics on parallel computers Itoh, Satoshi; Ordejón, Pablo; Martin, Richard M. Computer Physics Communications, Vol. 88, Issue 2-3 https://doi.org/10.1016/0010-4655(95)00031-A	journal	August 1995
Linear scaling conjugate gradient density matrix search as an alternative to diagonalization for first principles electronic structure calculations Millam, John M.; Scuseria, Gustavo E. The Journal of Chemical Physics, Vol. 106, Issue 13 https://doi.org/10.1063/1.473579	journal	April 1997
Ab initio molecular dynamics: Propagating the density matrix with Gaussian orbitals Schlegel, H. Bernhard; Millam, John M.; Iyengar, Srinivasan S. The Journal of Chemical Physics, Vol. 114, Issue 22 https://doi.org/10.1063/1.1372182	journal	June 2001
Density-matrix electronic-structure method with linear system-size scaling Li, X. -P.; Nunes, R. W.; Vanderbilt, David Physical Review B, Vol. 47, Issue 16 https://doi.org/10.1103/PhysRevB.47.10891	journal	April 1993
Benchmarking optimization software with performance profiles Dolan, Elizabeth D.; Moré, Jorge J. Mathematical Programming, Vol. 91, Issue 2 https://doi.org/10.1007/s101070100263	journal	January 2002
The university of Florida sparse matrix collection Davis, Timothy A.; Hu, Yifan ACM Transactions on Mathematical Software, Vol. 38, Issue 1 https://doi.org/10.1145/2049662.2049663	journal	November 2011
SUMMA: scalable universal matrix multiplication algorithm Van De Geijn, R. A.; Watts, J. Concurrency: Practice and Experience, Vol. 9, Issue 4 https://doi.org/10.1002/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2	journal	April 1997
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices Aykanat, Cevdet; Cambazoglu, B. Barla; Uçar, Bora Journal of Parallel and Distributed Computing, Vol. 68, Issue 5 https://doi.org/10.1016/j.jpdc.2007.09.006	journal	May 2008
Communication optimal parallel multiplication of sparse random matrices Ballard, Grey; Buluc, Aydin; Demmel, James Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures - SPAA '13 https://doi.org/10.1145/2486159.2486196	conference	January 2013
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments Buluç, Aydin; Gilbert, John R. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110848244	journal	January 2012
A parallel interior point algorithm for linear programming on a network of transputers Bisseling, R. H.; Doup, T. M.; Loyens, L. D. J. C. Annals of Operations Research, Vol. 43, Issue 2 https://doi.org/10.1007/BF02024486	journal	February 1993
Improving communication performance in dense linear algebra via topology aware collectives Solomonik, Edgar; Bhatele, Abhinav; Demmel, James Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063487	conference	January 2011
Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication Akbudak, Kadir; Aykanat, Cevdet SIAM Journal on Scientific Computing, Vol. 36, Issue 5 https://doi.org/10.1137/13092589X	journal	January 2014
A simplified density matrix minimization for linear scaling self-consistent field theory Challacombe, Matt The Journal of Chemical Physics, Vol. 110, Issue 5 https://doi.org/10.1063/1.477969	journal	February 1999
Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies Uçar, Bora; Aykanat, Cevdet SIAM Journal on Scientific Computing, Vol. 25, Issue 6 https://doi.org/10.1137/S1064827502410463	journal	January 2004
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs Karypis, George; Kumar, Vipin SIAM Journal on Scientific Computing, Vol. 20, Issue 1 https://doi.org/10.1137/S1064827595287997	journal	January 1998
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data Liu, Weifeng; Vinter, Brian 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.47	conference	May 2014
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods Bell, Nathan; Dalton, Steven; Olson, Luke N. SIAM Journal on Scientific Computing, Vol. 34, Issue 4 https://doi.org/10.1137/110838844	journal	January 2012
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging Gremse, Felix; Höfter, Andreas; Schwen, Lars Ole SIAM Journal on Scientific Computing, Vol. 37, Issue 1 https://doi.org/10.1137/130948811	journal	January 2015
The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics Ordonez, Carlos; Zhang, Yiqun; Cabrera, Wellington IEEE Transactions on Knowledge and Data Engineering, Vol. 28, Issue 7 https://doi.org/10.1109/TKDE.2016.2545664	journal	July 2016
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication Catalyurek, U. V.; Aykanat, C. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, Issue 7 https://doi.org/10.1109/71.780863	journal	July 1999
A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Karypis, George; Kumar, Vipin Journal of Parallel and Distributed Computing, Vol. 48, Issue 1 https://doi.org/10.1006/jpdc.1997.1403	journal	January 1998
A parallel formulation of interior point algorithms Karypis, George; Gupta, Anshul; Kumar, Vipin Proceedings of the 1994 ACM/IEEE conference on Supercomputing - Supercomputing '94 https://doi.org/10.1145/602770.602808	conference	January 1994
Parallel Triangle Counting and Enumeration Using Matrix Algebra Azad, Ariful; Buluc, Aydin; Gilbert, John 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW) https://doi.org/10.1109/IPDPSW.2015.75	conference	May 2015
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition Gustavson, Fred G. ACM Transactions on Mathematical Software, Vol. 4, Issue 3 https://doi.org/10.1145/355791.355796	journal	September 1978
Semiempirical methods with conjugate gradient density matrix search to replace diagonalization for molecular systems containing thousands of atoms Daniels, Andrew D.; Millam, John M.; Scuseria, Gustavo E. The Journal of Chemical Physics, Vol. 107, Issue 2 https://doi.org/10.1063/1.474404	journal	July 1997
On the representation and multiplication of hypersparse matrices Buluc, Aydin; Gilbert, John R. Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536313	conference	April 2008
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory Challacombe, Matt Computer Physics Communications, Vol. 128, Issue 1-2 https://doi.org/10.1016/s0010-4655(00)00074-6	journal	June 2000
Sparse matrix multiplication: The distributed block-compressed sparse row library Borštnik, Urban; VandeVondele, Joost; Weber, Valéry Parallel Computing, Vol. 40, Issue 5-6 https://doi.org/10.1016/j.parco.2014.03.012	journal	May 2014
Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures Akbudak, Kadir; Aykanat, Cevdet IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 8 https://doi.org/10.1109/TPDS.2017.2656893	journal	August 2017
Optimization of Linear Recursive Queries in SQL Ordonez, Carlos IEEE Transactions on Knowledge and Data Engineering, Vol. 22, Issue 2 https://doi.org/10.1109/TKDE.2009.83	journal	February 2010
Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase VandeVondele, Joost; Borštnik, Urban; Hutter, Jürg Journal of Chemical Theory and Computation, Vol. 8, Issue 10 https://doi.org/10.1021/ct200897x	journal	March 2012
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication Demmel, James; Eliahu, David; Fox, Armando 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.80	conference	May 2013
The Combinatorial BLAS: design, implementation, and applications Buluç, Aydın; Gilbert, John R. The International Journal of High Performance Computing Applications, Vol. 25, Issue 4 https://doi.org/10.1177/1094342011403516	journal	May 2011

Cited By (2)

Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight Chen, Yuedan; Xiao, Guoqing; Yang, Wangdong Neural Computing and Applications, Vol. 32, Issue 10 https://doi.org/10.1007/s00521-019-04121-z	journal	March 2019
A Systematic Survey of General Sparse Matrix-Matrix Multiplication Gao, Jianhua; Ji, Weixing; Tan, Zhaonian arXiv https://doi.org/10.48550/arxiv.2002.11273	text	January 2020

Similar Records

Reduce Operations: Send Volume Balancing While Minimizing Latency

Journal Article · Tue Jan 07 00:00:00 EST 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1525287

Karsavuran, M. Ozan; Acer, Seher; Aykanat, Cevdet

A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously

Journal Article · Mon Jun 06 00:00:00 EDT 2016 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1525287

Selvitopi, Oguz; Acer, Seher; Aykanat, Cevdet

Locality-aware and load-balanced static task scheduling for MapReduce

Journal Article · Fri Jul 27 00:00:00 EDT 2018 · Future Generations Computer Systems · OSTI ID:1525287

Selvitopi, Oguz; Demirci, Gunduz Vehbi; Turk, Ata; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

Citation Formats

References (39)

Cited By (2)

Similar Records

Related Subjects