GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis
Abstract Streaming graph processing performs batched updates and analytics on a time-evolving graph. The underlying representation format of the graph largely determines the throughputs of these updates and analytics phases. Existing representation formats usually employ variations of hash tables or adjacency lists. However, a recent study showed that the adjacency-list-based approaches perform poorly on heavy-tailed graphs, and the hash table-based approaches suffer on short-tailed graphs. We propose GraphTango, a hybrid representation format that provides excellent update and analytics throughput regardless of the graph’s degree distribution. GraphTango dynamically switches among three different formats based on a vertex’s degree: (i) Low-degree vertices store the edges directly with the neighborhood metadata, confining accesses to a single cache line, (2) Medium-degree vertices use adjacency lists, and (3) High-degree vertices use hash tables as well as adjacency lists. In this case, the adjacency list provides fast traversal during the analytics phase, while the hash table provides constant-time lookups during the update phase. We further optimized the performance by designing an open-addressing-based hash table that fully utilizes every fetched cache line. In addition, we developed a thread-local lock-free memory pool that allows fast growing/shrinking of the adjacency lists and hash tables in a multi-threaded environment. We evaluated GraphTango with the help of the SAGA-Bench framework and compared it with four other representation formats: Stinger, Degree-aware Robin Hood Hashing, and two adjacency list-based formats with different workload balancing scheme. On average, GraphTango provides 4.5x higher insertion throughput, 3.2x higher deletion throughput, and 1.1x higher analytics throughput over the next best format. Furthermore, we integrated GraphTango with the state-of-the-art graph processing frameworks DZiG and RisGraph. Compared to the vanilla DZiG and vanilla RisGraph , [ GraphTango + DZiG ] and [ GraphTango + RisGraph ] reduces the average batch processing time by 2.3x and 1.5x, respectively.
- Sponsoring Organization:
- USDOE
- OSTI ID:
- 2352367
- Journal Information:
- International Journal of Parallel Programming, Journal Name: International Journal of Parallel Programming Journal Issue: 3 Vol. 52; ISSN 0885-7458
- Publisher:
- Springer Science + Business MediaCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Advanced bandwidth scheduling algorithms in dedicated networks