skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Enabling Graph Mining in RDF Triplestores using SPARQL for Holistic In-situ Graph Analysis

Journal Article · · Expert Systems with Applications
 [1];  [1];  [2];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. North Carolina State Univ., Raleigh, NC (United States)

The graph analysis is now considered as a promising technique to discover useful knowledge in data with a new perspective. We envi- sion that there are two dimensions of graph analysis: OnLine Graph Analytic Processing (OLGAP) and Graph Mining (GM) where each respectively focuses on subgraph pattern matching and automatic knowledge discovery in graph. Moreover, as these two dimensions aim to complementarily solve complex problems, holistic in-situ graph analysis which covers both OLGAP and GM in a single system is critical for minimizing the burdens of operating multiple graph systems and transferring intermediate result-sets between those systems. Nevertheless, most existing graph analysis systems are only capable of one dimension of graph analysis. In this work, we take an approach to enabling GM capabilities (e.g., PageRank, connected-component analysis, node eccentricity, etc.) in RDF triplestores, which are originally developed to store RDF datasets and provide OLGAP capability. More specifically, to achieve our goal, we implemented six representative graph mining algorithms using SPARQL. The approach allows a wide range of available RDF data sets directly applicable for holistic graph analysis within a system. For validation of our approach, we evaluate performance of our implementations with nine real-world datasets and three different computing environments - a laptop computer, an Amazon EC2 instance, and a shared-memory Cray XMT2 URIKA-GD graph-processing appliance. The experimen- tal results show that our implementation can provide promising and scalable performance for real world graph analysis in all tested environments. The developed software is publicly available in an open-source project that we initiated.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1237609
Alternate ID(s):
OSTI ID: 1396775
Journal Information:
Expert Systems with Applications, Vol. 48, Issue 1; ISSN 0957-4174
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 8 works
Citation information provided by
Web of Science

Similar Records

Query optimization for graph analytics on linked data using SPARQL
Technical Report · Wed Jul 01 00:00:00 EDT 2015 · OSTI ID:1237609

TripleGraph
Software · Tue Apr 07 00:00:00 EDT 2020 · OSTI ID:1237609

Enabling Graph Appliance for Genome Assembly
Conference · Thu Jan 01 00:00:00 EST 2015 · OSTI ID:1237609