DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis

Journal Article · · International Journal of Parallel Programming

Abstract Streaming graph processing performs batched updates and analytics on a time-evolving graph. The underlying representation format of the graph largely determines the throughputs of these updates and analytics phases. Existing representation formats usually employ variations of hash tables or adjacency lists. However, a recent study showed that the adjacency-list-based approaches perform poorly on heavy-tailed graphs, and the hash table-based approaches suffer on short-tailed graphs. We propose GraphTango, a hybrid representation format that provides excellent update and analytics throughput regardless of the graph’s degree distribution. GraphTango dynamically switches among three different formats based on a vertex’s degree: (i) Low-degree vertices store the edges directly with the neighborhood metadata, confining accesses to a single cache line, (2) Medium-degree vertices use adjacency lists, and (3) High-degree vertices use hash tables as well as adjacency lists. In this case, the adjacency list provides fast traversal during the analytics phase, while the hash table provides constant-time lookups during the update phase. We further optimized the performance by designing an open-addressing-based hash table that fully utilizes every fetched cache line. In addition, we developed a thread-local lock-free memory pool that allows fast growing/shrinking of the adjacency lists and hash tables in a multi-threaded environment. We evaluated GraphTango with the help of the SAGA-Bench framework and compared it with four other representation formats: Stinger, Degree-aware Robin Hood Hashing, and two adjacency list-based formats with different workload balancing scheme. On average, GraphTango provides 4.5x higher insertion throughput, 3.2x higher deletion throughput, and 1.1x higher analytics throughput over the next best format. Furthermore, we integrated GraphTango with the state-of-the-art graph processing frameworks DZiG and RisGraph. Compared to the vanilla DZiG and vanilla RisGraph , [ GraphTango + DZiG ] and [ GraphTango + RisGraph ] reduces the average batch processing time by 2.3x and 1.5x, respectively.

Sponsoring Organization:
USDOE
OSTI ID:
2352367
Journal Information:
International Journal of Parallel Programming, Journal Name: International Journal of Parallel Programming Journal Issue: 3 Vol. 52; ISSN 0885-7458
Publisher:
Springer Science + Business MediaCopyright Statement
Country of Publication:
United States
Language:
English

References (19)

Robin hood hashing conference January 1985
Towards a Distributed Large-Scale Dynamic Graph Data Store conference May 2016
Navigation Graph for Tiled Media Streaming conference October 2019
DZiG conference April 2021
GraphMat: high performance graph analytics made productive journal July 2015
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs journal February 2008
RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s conference June 2021
Knowledge Discovery from Social Graph Data journal January 2016
Graphicionado: A high-performance and energy-efficient accelerator for graph analytics conference October 2016
GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs conference November 2021
Optimizing Vertex Pressure Dynamic Graph Partitioning in Many-Core Systems journal June 2021
STINGER: High performance data structure for streaming graphs conference September 2012
Chronos conference April 2014
How to apply de Bruijn graphs to genome assembly journal November 2011
Kineograph conference April 2012
Drowning in data: digital library architecture to support scientific use of embedded sensor networks conference January 2007
GraphTinker: A High Performance Data Structure for Dynamic Graph Processing conference May 2019
Pixie conference January 2018
SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads conference August 2020

Similar Records

Theoretically and practically efficient parallel nucleus decomposition
Journal Article · 2021 · Proceedings of the VLDB Endowment · OSTI ID:1980995

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · 2019 · OSTI ID:1576175

Advanced bandwidth scheduling algorithms in dedicated networks
Journal Article · 2009 · International Journal of Distributed Sensor Networks · OSTI ID:982143

Related Subjects