A matrixalgebraic formulation of distributedmemory maximal cardinality matching algorithms in bipartite graphs
We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributedmemory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrixalgebraic formulation of maximal matching. This generic matrixalgebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graphbased algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulksynchronous matrix operations, these algorithms expose a higher degree of parallelism on distributedmemory platforms than existing graphbased algorithms. We report highperformance implementations of three maximal matching algorithms using hybrid OpenMPMPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.
 Authors:

^{[1]};
^{[1]}
 Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
 Publication Date:
 Grant/Contract Number:
 AC0205CH11231
 Type:
 Accepted Manuscript
 Journal Name:
 Parallel Computing
 Additional Journal Information:
 Journal Volume: 58; Journal Issue: C; Journal ID: ISSN 01678191
 Publisher:
 Elsevier
 Research Org:
 Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
 Sponsoring Org:
 USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC21)
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING; cardinality matching; bipartite graph; parallel algorithm; matrixalgebra
 OSTI Identifier:
 1377506
 Alternate Identifier(s):
 OSTI ID: 1397980
Azad, Ariful, and Buluç, Aydın. A matrixalgebraic formulation of distributedmemory maximal cardinality matching algorithms in bipartite graphs. United States: N. p.,
Web. doi:10.1016/j.parco.2016.05.007.
Azad, Ariful, & Buluç, Aydın. A matrixalgebraic formulation of distributedmemory maximal cardinality matching algorithms in bipartite graphs. United States. doi:10.1016/j.parco.2016.05.007.
Azad, Ariful, and Buluç, Aydın. 2016.
"A matrixalgebraic formulation of distributedmemory maximal cardinality matching algorithms in bipartite graphs". United States.
doi:10.1016/j.parco.2016.05.007. https://www.osti.gov/servlets/purl/1377506.
@article{osti_1377506,
title = {A matrixalgebraic formulation of distributedmemory maximal cardinality matching algorithms in bipartite graphs},
author = {Azad, Ariful and Buluç, Aydın},
abstractNote = {We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributedmemory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrixalgebraic formulation of maximal matching. This generic matrixalgebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graphbased algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulksynchronous matrix operations, these algorithms expose a higher degree of parallelism on distributedmemory platforms than existing graphbased algorithms. We report highperformance implementations of three maximal matching algorithms using hybrid OpenMPMPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.},
doi = {10.1016/j.parco.2016.05.007},
journal = {Parallel Computing},
number = C,
volume = 58,
place = {United States},
year = {2016},
month = {5}
}