In-Memory Graph Databases for Web-Scale Data
Abstract
RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++ compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.
- Authors:
- Publication Date:
- Research Org.:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1208713
- Report Number(s):
- PNNL-SA-107315
400470000
- DOE Contract Number:
- AC05-76RL01830
- Resource Type:
- Journal Article
- Journal Name:
- Computer, 48(3):24-35
- Additional Journal Information:
- Journal Name: Computer, 48(3):24-35
- Country of Publication:
- United States
- Language:
- English
- Subject:
- GEMS; RDF; Graph databases; multithreading
Citation Formats
Castellana, Vito G., Morari, Alessandro, Weaver, Jesse R., Tumeo, Antonino, Haglin, David J., Villa, Oreste, and Feo, John. In-Memory Graph Databases for Web-Scale Data. United States: N. p., 2015.
Web. doi:10.1109/MC.2015.74.
Castellana, Vito G., Morari, Alessandro, Weaver, Jesse R., Tumeo, Antonino, Haglin, David J., Villa, Oreste, & Feo, John. In-Memory Graph Databases for Web-Scale Data. United States. https://doi.org/10.1109/MC.2015.74
Castellana, Vito G., Morari, Alessandro, Weaver, Jesse R., Tumeo, Antonino, Haglin, David J., Villa, Oreste, and Feo, John. 2015.
"In-Memory Graph Databases for Web-Scale Data". United States. https://doi.org/10.1109/MC.2015.74.
@article{osti_1208713,
title = {In-Memory Graph Databases for Web-Scale Data},
author = {Castellana, Vito G. and Morari, Alessandro and Weaver, Jesse R. and Tumeo, Antonino and Haglin, David J. and Villa, Oreste and Feo, John},
abstractNote = {RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++ compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.},
doi = {10.1109/MC.2015.74},
url = {https://www.osti.gov/biblio/1208713},
journal = {Computer, 48(3):24-35},
number = ,
volume = ,
place = {United States},
year = {Sun Mar 01 00:00:00 EST 2015},
month = {Sun Mar 01 00:00:00 EST 2015}
}