Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model
Abstract
GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a largescale graph with billions of vertices. We propose a lightweight asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 8020 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices inmore »
 Authors:
 Publication Date:
 Research Org.:
 Argonne National Lab. (ANL), Argonne, IL (United States)
 Sponsoring Org.:
 National Natural Science Foundation of China (NNSFC); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC22)
 OSTI Identifier:
 1416975
 DOE Contract Number:
 AC0206CH11357
 Resource Type:
 Journal Article
 Resource Relation:
 Journal Name: IEEE Transactions on Knowledge and Data Engineering; Journal Volume: 30; Journal Issue: 1
 Country of Publication:
 United States
 Language:
 English
 Subject:
 Asynchronous Computing Model; GPGPU; Graph Processing
Citation Formats
Shi, Xuanhua, Luo, Xuan, Liang, Junling, Zhao, Peng, Di, Sheng, He, Bingsheng, and Jin, Hai. Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model. United States: N. p., 2018.
Web. doi:10.1109/TKDE.2017.2745562.
Shi, Xuanhua, Luo, Xuan, Liang, Junling, Zhao, Peng, Di, Sheng, He, Bingsheng, & Jin, Hai. Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model. United States. doi:10.1109/TKDE.2017.2745562.
Shi, Xuanhua, Luo, Xuan, Liang, Junling, Zhao, Peng, Di, Sheng, He, Bingsheng, and Jin, Hai. 2018.
"Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model". United States.
doi:10.1109/TKDE.2017.2745562.
@article{osti_1416975,
title = {Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model},
author = {Shi, Xuanhua and Luo, Xuan and Liang, Junling and Zhao, Peng and Di, Sheng and He, Bingsheng and Jin, Hai},
abstractNote = {GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a largescale graph with billions of vertices. We propose a lightweight asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 8020 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process largescale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on realworld data (Amazon, DBLP, YouTube, RoadNetCA, WikiTalk and Twitter) to evaluate our approach and make comparisons with wellknown nonpreprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPUbased graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNetCA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNetCA.},
doi = {10.1109/TKDE.2017.2745562},
journal = {IEEE Transactions on Knowledge and Data Engineering},
number = 1,
volume = 30,
place = {United States},
year = 2018,
month = 1
}

Graph algorithms are challenging to parallelize when high performance and scalability are primary goals. Low concurrency, poor data locality, irregular access pattern, and high data access to computation ratio are among the chief reasons for the challenge. The performance implication of these features is exasperated on distributed memory machines. More success is being achieved on sharedmemory, multicore architectures supporting multithreading. We consider a prototypical graph problem, coloring, and show how a greedy algorithm for solving it can be e*ectively parallelized on multithreaded architectures. We present in particular two di*erent parallel algorithms. The first relies on speculation and iteration, and ismore »

A parallel graph coloring heuristic
The problem of computing good graph colorings arises in many diverse applications, such as in the estimation of sparse Jacobians and in the development of efficient, parallel iterative methods for solving sparse linear systems. This paper presents an asynchronous graph coloring heuristic well suited to distributed memory parallel computers. Experimental results obtained on an Intel iPSC/860 are presented, which demonstrate that, for graphs arising from finite element applications, the heuristic exhibits scalable performance and generates colorings usually within three or four colors of the bestknown linear time sequential heuristics. For bounded degree graphs, it is shown that the expected runningmore » 
An asynchronous traversal engine for graphbased rich metadata management
Rich metadata in highperformance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The highvolume HPC use case, with millions of entities and relationships, naturally requires an outofcore distributed property graph database, which must support live updates (to ingest production information in real time), lowlatency point queries (for frequent metadata operations such as permission checking), and largescale traversals (for provenancemore »