skip to main content

SciTech ConnectSciTech Connect

Title: In-Memory Graph Databases for Web-Scale Data

RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++ compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.
; ; ; ; ; ;
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Journal Article
Resource Relation:
Journal Name: Computer, 48(3):24-35
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
Country of Publication:
United States
GEMS; RDF; Graph databases; multithreading