High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS
Graphs represent an increasingly popular data model for data-analytics, since they can naturally represent relationships and interactions between entities. Relational databases and their pure table-based data model are not well suitable to store and process sparse data. Consequently, graph databases have gained interest in the last few years and the Resource Description Framework (RDF) became the standard data model for graph data. Nevertheless, while RDF is well suited to analyze the relationships between the entities, it is not efficient in representing their attributes and properties. In this work we propose the adoption of a new hybrid data model, based on attributed graphs, that aims at overcoming the limitations of the pure relational and graph data models. We present how we have re-designed the GEMS data-analytics framework to fully take advantage of the proposed hybrid data model. To improve analysts productivity, in addition to a C++ API for applications development, we adopt GraQL as input query language. We validate our approach implementing a set of queries on net-flow data and we compare our framework performance against Neo4j. Experimental results show significant performance improvement over Neo4j, up to several orders of magnitude when increasing the size of the input data.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1440650
- Report Number(s):
- PNNL-SA-124655; 453040300
- Resource Relation:
- Conference: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2017), May 29-June 2, 2017, Orlando, Florida, 1029-1038
- Country of Publication:
- United States
- Language:
- English
Similar Records
Semantic Property Graph for Scalable Knowledge Graph Analytics
In-Memory Graph Databases for Web-Scale Data