Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via TreeGrafting
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its sharedmemory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadthfirst searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiplesource searches cannot discard a search tree once no augmenting path is discovered from the tree, unlike algorithms that rely on singlesource searches. We describe a novel treegrafting method that eliminates most of the redundant edge traversals resulting from this property of multiplesource searches. We also employ the recent directionoptimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMAaware implementation is scalable to 80 threads of an Intel multiprocessor and tomore »
 Authors:

^{[1]}
;
^{[1]};
^{[2]}
 Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computatoinal Research Division
 Purdue Univ., West Lafayette, IN (United States). Dept. of Computer Science
 Publication Date:
 Grant/Contract Number:
 AC0205CH11231; FG0213ER26135; CCF 1218196; 1552323
 Type:
 Accepted Manuscript
 Journal Name:
 IEEE Transactions on Parallel and Distributed Systems
 Additional Journal Information:
 Journal Volume: 28; Journal Issue: 1; Journal ID: ISSN 10459219
 Publisher:
 IEEE
 Research Org:
 Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
 Sponsoring Org:
 USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC21)
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING; cardinality matching; bipartite graph; tree grafting; parallel algorithms
 OSTI Identifier:
 1379627
Azad, Ariful, Buluc, Aydn, and Pothen, Alex. Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via TreeGrafting. United States: N. p.,
Web. doi:10.1109/TPDS.2016.2546258.
Azad, Ariful, Buluc, Aydn, & Pothen, Alex. Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via TreeGrafting. United States. doi:10.1109/TPDS.2016.2546258.
Azad, Ariful, Buluc, Aydn, and Pothen, Alex. 2016.
"Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via TreeGrafting". United States.
doi:10.1109/TPDS.2016.2546258. https://www.osti.gov/servlets/purl/1379627.
@article{osti_1379627,
title = {Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via TreeGrafting},
author = {Azad, Ariful and Buluc, Aydn and Pothen, Alex},
abstractNote = {It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its sharedmemory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadthfirst searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiplesource searches cannot discard a search tree once no augmenting path is discovered from the tree, unlike algorithms that rely on singlesource searches. We describe a novel treegrafting method that eliminates most of the redundant edge traversals resulting from this property of multiplesource searches. We also employ the recent directionoptimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMAaware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.},
doi = {10.1109/TPDS.2016.2546258},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 1,
volume = 28,
place = {United States},
year = {2016},
month = {3}
}