skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

Journal Article · · IEEE Transactions on Parallel and Distributed Systems
 [1];  [2];  [2];  [1]
  1. Georgia Institute of Technology, Atlanta, GA (United States)
  2. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NFS)
Grant/Contract Number:
AC04-94AL85000; CF-1919021; NA0003525
OSTI ID:
1810367
Report Number(s):
SAND-2021-7901J; 697218
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Vol. 33, Issue 2; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (39)

Experimental evaluation of efficient sparse matrix distributions conference January 1996
Performance-portable sparse matrix-matrix multiplication for many-core architectures
  • Deveci, Mehmet; Trott, Christian; Rajamanickam, Sivasankaran
  • 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.8
conference May 2017
Fast linear algebra-based triangle counting with KokkosKernels
  • Wolf, Michael M.; Deveci, Mehmet; Berry, Jonathan W.
  • 2017 IEEE High-Performance Extreme Computing Conference (HPEC), 2017 IEEE High Performance Extreme Computing Conference (HPEC) https://doi.org/10.1109/HPEC.2017.8091043
conference September 2017
Run-time optimizations for replicated dataflows on heterogeneous environments
  • Teodoro, George; Hartley, Timothy D. R.; Catalyurek, Umit
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851479
conference January 2010
Sparsity: Optimization Framework for Sparse Matrix Kernels journal February 2004
A Partitioning Strategy for Nonuniform Problems on Multiprocessors journal May 1987
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication journal July 1999
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs journal January 1998
A 2D Parallel Triangle Counting Algorithm for Distributed-Memory Architectures conference August 2019
Shaping communities out of triangles conference October 2012
Dynamic partitioning of non-uniform structured workloads with spacefilling curves journal March 1996
Curvature of co-links uncovers hidden thematic layers in the World Wide Web journal April 2002
On triangulation-based dense neighborhood graph discovery journal November 2010
Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation journal August 2010
Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism conference September 2017
On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe journal January 2010
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable conference July 2018
Data-flow algorithms for parallel matrix computation journal August 1985
TriCore: Parallel Triangle Counting on GPUs conference November 2018
Scalable Triangle Counting on Distributed-Memory Systems conference September 2019
Finding a Minimum Circuit in a Graph journal November 1978
Why do simple algorithms for triangle enumeration work in the real world?
  • Berry, Jonathan W.; Fostvedt, Luke K.; Nordman, Daniel J.
  • ITCS'14: Innovations in Theoretical Computer Science, Proceedings of the 5th conference on Innovations in theoretical computer science https://doi.org/10.1145/2554797.2554819
conference January 2014
An Efficient Parallel Algorithm for Matrix-Vector Multiplication journal March 1995
Finding and counting given length cycles journal March 1997
Multicore triangle computations without tuning conference April 2015
The input/output complexity of triangle enumeration conference June 2014
Main-memory triangle computations for very large (sparse (power-law)) graphs journal November 2008
Load-balancing spatially located computations using rectangular partitions journal October 2012
The university of Florida sparse matrix collection journal November 2011
High-Performance Triangle Counting on GPUs conference September 2018
1.5D Parallel Sparse Matrix-Vector Multiply journal January 2018
Benchmarking optimization software with performance profiles journal January 2002
Scalable matrix computations on large scale-free graphs using 2D graph partitioning
  • Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503293
conference January 2013
TriX: Triangle counting at extreme scale conference September 2017
Rectilinear Partitioning of Irregular Data Parallel Computations journal November 1994
CHARM++: a portable concurrent object oriented system based on C++
  • Kale, Laxmikant V.; Krishnan, Sanjeev
  • Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications - OOPSLA '93 https://doi.org/10.1145/165854.165874
conference January 1993
Optimizing nonzero-based sparse matrix partitioning models via reducing latency journal December 2018
Kokkos Array performance-portable manycore programming model
  • Edwards, H. Carter; Sunderland, Daniel
  • Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12 https://doi.org/10.1145/2141702.2141703
conference January 2012
The Input/Output Complexity of Triangle Enumeration text January 2013

Similar Records

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments
Technical Report · Thu Oct 01 00:00:00 EDT 2020 · OSTI ID:1810367

Trust: Triangle Counting Reloaded on GPUs
Journal Article · Tue Mar 09 00:00:00 EST 2021 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1810367

Wedge sampling for computing clustering coefficients and triangle counts on large graphs
Journal Article · Thu May 08 00:00:00 EDT 2014 · Statistical Analysis and Data Mining · OSTI ID:1810367