Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.
Yasar, Abdurrahman, Rajamanickam, Sivasankaran, Berry, Jonathan W., & Catalyurek, Umit V. (2022). A Block-Based Triangle Counting Algorithm on Heterogeneous Environments. IEEE Transactions on Parallel and Distributed Systems, 33(2). https://doi.org/10.1109/tpds.2021.3093240
Yasar, Abdurrahman, Rajamanickam, Sivasankaran, Berry, Jonathan W., et al., "A Block-Based Triangle Counting Algorithm on Heterogeneous Environments," IEEE Transactions on Parallel and Distributed Systems 33, no. 2 (2022), https://doi.org/10.1109/tpds.2021.3093240
@article{osti_1810367,
author = {Yasar, Abdurrahman and Rajamanickam, Sivasankaran and Berry, Jonathan W. and Catalyurek, Umit V.},
title = {A Block-Based Triangle Counting Algorithm on Heterogeneous Environments},
annote = {Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.},
doi = {10.1109/tpds.2021.3093240},
url = {https://www.osti.gov/biblio/1810367},
journal = {IEEE Transactions on Parallel and Distributed Systems},
issn = {ISSN 1045-9219},
number = {2},
volume = {33},
place = {United States},
publisher = {IEEE},
year = {2022},
month = {01}}
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NFS)
Grant/Contract Number:
AC04-94AL85000; NA0003525
OSTI ID:
1810367
Report Number(s):
SAND--2021-7901J; 697218
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 2 Vol. 33; ISSN 1045-9219
2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)https://doi.org/10.1109/IPDPSW.2017.8
Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93https://doi.org/10.1145/165854.165874
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12https://doi.org/10.1145/2141702.2141703
Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13https://doi.org/10.1145/2503210.2503293
Berry, Jonathan W.; Fostvedt, Luke K.; Nordman, Daniel J.
ITCS'14: Innovations in Theoretical Computer Science, Proceedings of the 5th conference on Innovations in theoretical computer sciencehttps://doi.org/10.1145/2554797.2554819