skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An asynchronous traversal engine for graph-based rich metadata management

Journal Article · · Parallel Computing
 [1];  [2];  [2];  [2];  [1];  [1]
  1. Texas Tech Univ., Lubbock, TX (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenance data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC); National Science Foundation (NSF)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1333002
Alternate ID(s):
OSTI ID: 1359729
Journal Information:
Parallel Computing, Vol. 58, Issue C; ISSN 0167-8191
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Similar Records

Scalable Pattern Matching in Metadata Graphs via Constraint Checking
Journal Article · Mon Jan 04 00:00:00 EST 2021 · ACM Transactions on Parallel Computing · OSTI ID:1333002

GraphMeta: Managing HPC Rich Metadata in Graphs
Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1333002

Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model
Journal Article · Tue Dec 18 00:00:00 EST 2018 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1333002