An asynchronous traversal engine for graph-based rich metadata management

Dai, Dong; Carns, Philip; Ross, Robert B.; Jenkins, John; Muirhead, Nicholas; Chen, Yong

doi:10.1016/j.parco.2016.06.002

Title: An asynchronous traversal engine for graph-based rich metadata management

Journal Article · Thu Jun 23 00:00:00 EDT 2016 · Parallel Computing

DOI:https://doi.org/10.1016/j.parco.2016.06.002· OSTI ID:1333002

Dai, Dong ^[1]; Carns, Philip ^[2]; Ross, Robert B. ^[2]; Jenkins, John ^[2]; Muirhead, Nicholas ^[1]; Chen, Yong ^[1]

Texas Tech Univ., Lubbock, TX (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenance data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.

View Accepted Manuscript (DOE)

View Accepted Manuscript (Publisher)

Cite

Export

Save

Research Organization:: Argonne National Laboratory (ANL), Argonne, IL (United States)

Sponsoring Organization:: USDOE Office of Science (SC); National Science Foundation (NSF)

Grant/Contract Number:: AC02-06CH11357

OSTI ID:: 1333002

Alternate ID(s):: OSTI ID: 1359729

Journal Information:: Parallel Computing, Vol. 58, Issue C; ISSN 0167-8191

Publisher:: ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 2 works

Citation information provided by
Web of Science

Similar Records

Scalable Pattern Matching in Metadata Graphs via Constraint Checking

Journal Article · Mon Jan 04 00:00:00 EST 2021 · ACM Transactions on Parallel Computing · OSTI ID:1333002

Reza, Tahsin; Halawa, Hassan; Ripeanu, Matei; +2 more

GraphMeta: Managing HPC Rich Metadata in Graphs

Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1333002

Dai, Dong; Chen, Yong; Carns, Philip; +3 more

Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model

Journal Article · Tue Dec 18 00:00:00 EST 2018 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1333002

Dai, Dong; Chen, Yong; Carns, Philip; +3 more

Related Subjects

97 MATHEMATICS AND COMPUTING
96 KNOWLEDGE MANAGEMENT AND PRESERVATION
graph partitioning
graph traversal
parallel file systems
property graph
rich metadata management

Title: An asynchronous traversal engine for graph-based rich metadata management

Citation Formats

Similar Records

Related Subjects