skip to main content

SciTech ConnectSciTech Connect

Title: Toward a Data Scalable Solution for Facilitating Discovery of Scientific Data Resources

Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of "data scaling" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomic, climate and weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose the use of our SGEM stack -- a system designed for answering graph-based queries over large datasets on cluster architectures -- for answering complex queries over the metadata, and we report early results for our current capability.
Authors:
; ; ; ; ; ; ; ; ;
Publication Date:
OSTI Identifier:
1123247
Report Number(s):
PNNL-SA-98169
400470000
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: DISCS-2013: Proceedings of the International Workshop on Data-Intensive Scalable Computing Systems, November 18, 2013, Denver, CO, 55-60
Publisher:
Association for Computing Machinery , New York, NY, United States(US).
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English