Query optimization for graph analytics on linked data using SPARQL
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performance of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Technical Report
- Research Org:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org:
- USDOE Laboratory Directed Research and Development (LDRD) Program
- Country of Publication:
- United States
- urika; SPARQL; big data; semantic web
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.