Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

Shi, Xuanhua; Luo, Xuan; Liang, Junling; Zhao, Peng; Di, Sheng; He, Bingsheng; Jin, Hai

doi:10.1109/TKDE.2017.2745562

Title: Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

Journal Article · Tue Aug 29 00:00:00 EDT 2017 · IEEE Transactions on Knowledge and Data Engineering

DOI:https://doi.org/10.1109/TKDE.2017.2745562· OSTI ID:1416975

^[1];

^[1]; Liang, Junling ^[1]; Zhao, Peng ^[1]; Di, Sheng ^[2]; He, Bingsheng ^[3]; Jin, Hai ^[1]

Huazhong Univ. of Science and Technology, Wuhan (China). Services Computing Technology and System Lab., Big Data Technology and System Lab., School of Computer
Argonne National Lab. (ANL), Argonne, IL (United States)
National Univ. of Singapore (Singapore). Dept. of Computer Science

GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weight asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. Here, we conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Argonne National Lab. (ANL), Argonne, IL (United States)

Sponsoring Organization:: National Natural Science Foundation of China (NSFC); USDOE Office of Science (SC)

Grant/Contract Number:: AC02-06CH11357

OSTI ID:: 1416975

Journal Information:: IEEE Transactions on Knowledge and Data Engineering, Vol. 30, Issue 1; ISSN 1041-4347

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 24 works

Citation information provided by
Web of Science

Cited By (6)

Improving parallel efficiency for asynchronous graph analytics using Gauss‐Seidel‐based matrix computation Luo, Le; Liu, Yi Concurrency and Computation: Practice and Experience https://doi.org/10.1002/cpe.5267	journal	April 2019
WolfPath: Accelerating Iterative Traversing-Based Graph Processing Algorithms on GPU Zhu, Huanzhou; He, Ligang; Fu, Songling International Journal of Parallel Programming, Vol. 47, Issue 4 https://doi.org/10.1007/s10766-017-0533-y	journal	November 2017
L-PowerGraph: a lightweight distributed graph-parallel communication mechanism Zhao, Yue; Yoshigoe, Kenji; Xie, Mengjun The Journal of Supercomputing, Vol. 76, Issue 3 https://doi.org/10.1007/s11227-018-2359-9	journal	April 2018
Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation Wang, Lei; Zhuang, Liangji; Chen, Junhang ACM SIGPLAN Notices, Vol. 53, Issue 1 https://doi.org/10.1145/3200691.3178508	journal	March 2018
Efficient Tensor Sensing for RF Tomographic Imaging on GPUs Xu, Da; Zhang, Tao Future Internet, Vol. 11, Issue 2 https://doi.org/10.3390/fi11020046	journal	February 2019
Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation Wang, Lei; Zhuang, Liangji; Chen, Junhang PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3178487.3178508	conference	February 2018

Similar Records

A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

Conference · Sat Feb 16 00:00:00 EST 2019 · OSTI ID:1416975

Meng, Ke; Li, Jiajia; Tan, Guangming; +1 more

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1416975

Shen, Xipeng

Synchronization-Avoiding Graph Algorithms

Conference · Mon Dec 17 00:00:00 EST 2018 · OSTI ID:1416975

Firoz, Jesun S.; Zalewski, Marcin J.; Kanewala, Thejaka A.; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Asynchronous Computing Model
GPGPU
Graph Processing

Title: Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

Citation Formats

Cited By (6)

Similar Records

Related Subjects