skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: In-Memory Graph Databases for Web-Scale Data

Abstract

RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++ compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1208713
Report Number(s):
PNNL-SA-107315
400470000
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Journal Article
Journal Name:
Computer, 48(3):24-35
Additional Journal Information:
Journal Name: Computer, 48(3):24-35
Country of Publication:
United States
Language:
English
Subject:
GEMS; RDF; Graph databases; multithreading

Citation Formats

Castellana, Vito G., Morari, Alessandro, Weaver, Jesse R., Tumeo, Antonino, Haglin, David J., Villa, Oreste, and Feo, John. In-Memory Graph Databases for Web-Scale Data. United States: N. p., 2015. Web. doi:10.1109/MC.2015.74.
Castellana, Vito G., Morari, Alessandro, Weaver, Jesse R., Tumeo, Antonino, Haglin, David J., Villa, Oreste, & Feo, John. In-Memory Graph Databases for Web-Scale Data. United States. https://doi.org/10.1109/MC.2015.74
Castellana, Vito G., Morari, Alessandro, Weaver, Jesse R., Tumeo, Antonino, Haglin, David J., Villa, Oreste, and Feo, John. 2015. "In-Memory Graph Databases for Web-Scale Data". United States. https://doi.org/10.1109/MC.2015.74.
@article{osti_1208713,
title = {In-Memory Graph Databases for Web-Scale Data},
author = {Castellana, Vito G. and Morari, Alessandro and Weaver, Jesse R. and Tumeo, Antonino and Haglin, David J. and Villa, Oreste and Feo, John},
abstractNote = {RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++ compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.},
doi = {10.1109/MC.2015.74},
url = {https://www.osti.gov/biblio/1208713}, journal = {Computer, 48(3):24-35},
number = ,
volume = ,
place = {United States},
year = {Sun Mar 01 00:00:00 EST 2015},
month = {Sun Mar 01 00:00:00 EST 2015}
}