Enabling Graph Mining in RDF Triplestores using SPARQL for Holistic Insitu Graph Analysis
The graph analysis is now considered as a promising technique to discover useful knowledge in data with a new perspective. We envi sion that there are two dimensions of graph analysis: OnLine Graph Analytic Processing (OLGAP) and Graph Mining (GM) where each respectively focuses on subgraph pattern matching and automatic knowledge discovery in graph. Moreover, as these two dimensions aim to complementarily solve complex problems, holistic insitu graph analysis which covers both OLGAP and GM in a single system is critical for minimizing the burdens of operating multiple graph systems and transferring intermediate resultsets between those systems. Nevertheless, most existing graph analysis systems are only capable of one dimension of graph analysis. In this work, we take an approach to enabling GM capabilities (e.g., PageRank, connectedcomponent analysis, node eccentricity, etc.) in RDF triplestores, which are originally developed to store RDF datasets and provide OLGAP capability. More specifically, to achieve our goal, we implemented six representative graph mining algorithms using SPARQL. The approach allows a wide range of available RDF data sets directly applicable for holistic graph analysis within a system. For validation of our approach, we evaluate performance of our implementations with nine realworld datasets and three different computing environmentsmore »
 Authors:

^{[1]};
^{[1]};
^{[2]};
^{[1]}
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
 North Carolina State Univ., Raleigh, NC (United States)
 Publication Date:
 Grant/Contract Number:
 AC0500OR22725
 Type:
 Accepted Manuscript
 Journal Name:
 Expert Systems with Applications
 Additional Journal Information:
 Journal Volume: 48; Journal Issue: 1; Journal ID: ISSN 09574174
 Publisher:
 Elsevier
 Research Org:
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
 Sponsoring Org:
 USDOE
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING; graph; mining; analysis; RDF; SPARQL; triplestore; semantic web
 OSTI Identifier:
 1237609
 Alternate Identifier(s):
 OSTI ID: 1396775
Lee, Sangkeun, Sukumar, Sreenivas R, Hong, Seokyong, and Lim, SeungHwan. Enabling Graph Mining in RDF Triplestores using SPARQL for Holistic Insitu Graph Analysis. United States: N. p.,
Web. doi:10.1016/j.eswa.2015.11.010.
Lee, Sangkeun, Sukumar, Sreenivas R, Hong, Seokyong, & Lim, SeungHwan. Enabling Graph Mining in RDF Triplestores using SPARQL for Holistic Insitu Graph Analysis. United States. doi:10.1016/j.eswa.2015.11.010.
Lee, Sangkeun, Sukumar, Sreenivas R, Hong, Seokyong, and Lim, SeungHwan. 2016.
"Enabling Graph Mining in RDF Triplestores using SPARQL for Holistic Insitu Graph Analysis". United States.
doi:10.1016/j.eswa.2015.11.010. https://www.osti.gov/servlets/purl/1237609.
@article{osti_1237609,
title = {Enabling Graph Mining in RDF Triplestores using SPARQL for Holistic Insitu Graph Analysis},
author = {Lee, Sangkeun and Sukumar, Sreenivas R and Hong, Seokyong and Lim, SeungHwan},
abstractNote = {The graph analysis is now considered as a promising technique to discover useful knowledge in data with a new perspective. We envi sion that there are two dimensions of graph analysis: OnLine Graph Analytic Processing (OLGAP) and Graph Mining (GM) where each respectively focuses on subgraph pattern matching and automatic knowledge discovery in graph. Moreover, as these two dimensions aim to complementarily solve complex problems, holistic insitu graph analysis which covers both OLGAP and GM in a single system is critical for minimizing the burdens of operating multiple graph systems and transferring intermediate resultsets between those systems. Nevertheless, most existing graph analysis systems are only capable of one dimension of graph analysis. In this work, we take an approach to enabling GM capabilities (e.g., PageRank, connectedcomponent analysis, node eccentricity, etc.) in RDF triplestores, which are originally developed to store RDF datasets and provide OLGAP capability. More specifically, to achieve our goal, we implemented six representative graph mining algorithms using SPARQL. The approach allows a wide range of available RDF data sets directly applicable for holistic graph analysis within a system. For validation of our approach, we evaluate performance of our implementations with nine realworld datasets and three different computing environments  a laptop computer, an Amazon EC2 instance, and a sharedmemory Cray XMT2 URIKAGD graphprocessing appliance. The experimen tal results show that our implementation can provide promising and scalable performance for real world graph analysis in all tested environments. The developed software is publicly available in an opensource project that we initiated.},
doi = {10.1016/j.eswa.2015.11.010},
journal = {Expert Systems with Applications},
number = 1,
volume = 48,
place = {United States},
year = {2016},
month = {1}
}