skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

Journal Article · · ACM Transactions on Parallel Computing
DOI:https://doi.org/10.1145/3155292· OSTI ID:1525287

We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. Here, we also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product--parallel formulation scales better, the row-by-row-product--parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1525287
Journal Information:
ACM Transactions on Parallel Computing, Vol. 4, Issue 3; ISSN 2329-4949
Publisher:
Association for Computing MachineryCopyright Statement
Country of Publication:
United States
Language:
English

References (39)

Amazon.com recommendations: item-to-item collaborative filtering journal January 2003
A Multigrid Tutorial, Second Edition book January 2000
Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication conference January 2015
Reducing latency cost in 2D sparse matrix partitioning models journal September 2016
SUMMA: scalable universal matrix multiplication algorithm journal April 1997
Order-N tight-binding molecular dynamics on parallel computers journal August 1995
Linear scaling conjugate gradient density matrix search as an alternative to diagonalization for first principles electronic structure calculations journal April 1997
Ab initio molecular dynamics: Propagating the density matrix with Gaussian orbitals journal June 2001
Density-matrix electronic-structure method with linear system-size scaling journal April 1993
Benchmarking optimization software with performance profiles journal January 2002
The university of Florida sparse matrix collection journal November 2011
SUMMA: scalable universal matrix multiplication algorithm journal April 1997
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices journal May 2008
Communication optimal parallel multiplication of sparse random matrices conference January 2013
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments journal January 2012
A parallel interior point algorithm for linear programming on a network of transputers journal February 1993
Improving communication performance in dense linear algebra via topology aware collectives
  • Solomonik, Edgar; Bhatele, Abhinav; Demmel, James
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063487
conference January 2011
Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication journal January 2014
A simplified density matrix minimization for linear scaling self-consistent field theory journal February 1999
Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies journal January 2004
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs journal January 1998
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
  • Liu, Weifeng; Vinter, Brian
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.47
conference May 2014
Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods journal January 2012
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging journal January 2015
The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics journal July 2016
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication journal July 1999
A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering journal January 1998
A parallel formulation of interior point algorithms conference January 1994
Parallel Triangle Counting and Enumeration Using Matrix Algebra conference May 2015
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition journal September 1978
Semiempirical methods with conjugate gradient density matrix search to replace diagonalization for molecular systems containing thousands of atoms journal July 1997
On the representation and multiplication of hypersparse matrices conference April 2008
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory journal June 2000
Sparse matrix multiplication: The distributed block-compressed sparse row library journal May 2014
Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures journal August 2017
Optimization of Linear Recursive Queries in SQL journal February 2010
Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase journal March 2012
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
  • Demmel, James; Eliahu, David; Fox, Armando
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.80
conference May 2013
The Combinatorial BLAS: design, implementation, and applications journal May 2011

Cited By (2)

Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight journal March 2019
A Systematic Survey of General Sparse Matrix-Matrix Multiplication text January 2020

Similar Records

Reduce Operations: Send Volume Balancing While Minimizing Latency
Journal Article · Tue Jan 07 00:00:00 EST 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1525287

A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously
Journal Article · Mon Jun 06 00:00:00 EDT 2016 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1525287

Locality-aware and load-balanced static task scheduling for MapReduce
Journal Article · Fri Jul 27 00:00:00 EDT 2018 · Future Generations Computer Systems · OSTI ID:1525287

Related Subjects