skip to main content

Title: Evaluation of Graph Pattern Matching Workloads in Graph Analysis Systems

Graph analysis has emerged as a powerful method for data scientists to represent, integrate, query, and explore heterogeneous data sources. As a result, graph data management and mining became a popular area of research, and led to the development of plethora of systems in recent years. Unfortunately, the number of emerging graph analysis systems and the wide range of applications, coupled with a lack of apples-to-apples comparisons, make it difficult to understand the trade-offs between different systems and the graph operations for which they are designed. A fair comparison of these systems is a challenging task for the following reasons: multiple data models, non-standardized serialization formats, various query interfaces to users, and diverse environments they operate in. To address these key challenges, in this paper we present a new benchmark suite by extending the Lehigh University Benchmark (LUBM) to cover the most common capabilities of various graph analysis systems. We provide the design process of the benchmark, which generalizes the workflow for data scientists to conduct the desired graph analysis on different graph analysis systems. Equipped with this extended benchmark suite, we present performance comparison for nine subgraph pattern retrieval operations over six graph analysis systems, namely NetworkX, Neo4j, Jena,more » Titan, GraphX, and uRiKA. Through the proposed benchmark suite, this study reveals both quantitative and qualitative findings in (1) implications in loading data into each system; (2) challenges in describing graph patterns for each query interface; and (3) different sensitivity of each system to query selectivity. We envision that this study will pave the road for: (i) data scientists to select the suitable graph analysis systems, and (ii) data management system designers to advance graph analysis systems.« less
 [1] ;  [2] ;  [3] ;  [3] ;  [1]
  1. North Carolina State University (NCSU), Raleigh
  2. (Matt) [ORNL
  3. ORNL
Publication Date:
OSTI Identifier:
DOE Contract Number:
Resource Type:
Resource Relation:
Conference: 2016 International ACM Symposium on High-Performance Parallel and Distributed Computing, Kyoto, Japan, 20160531, 20160531
Research Org:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org:
ORNL LDRD Director's R&D
Country of Publication:
United States