skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Flexible and Scalable Data Fusion using Proactive Schemaless Information Services

Technical Report ·
DOI:https://doi.org/10.2172/1323278· OSTI ID:1323278
 [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Exascale data environments are fast approaching, driven by diverse structured and unstructured data such as system and application telemetry streams, open-source information capture, and on-demand simulation output. Storage costs having plummeted, the question is now one of converting vast stores of data to actionable information. Complicating this problem are the low degrees of awareness across domain boundaries about what potentially useful data may exist, and write-once- read-never issues (data generation/collection rates outpacing data analysis and integration rates). Increasingly, technologists and researchers need to correlate previously unrelated data sources and artifacts to produce fused data views for domain-specific purposes. New tools and approaches for creating such views from vast amounts of data are vitally important to maintaining research and operational momentum. We propose to research and develop tools and services to assist in the creation, refinement, discovery and reuse of fused data views over large, diverse collections of heterogeneously structured data. We innovate in the following ways. First, we enable and encourage end-users to introduce customized index methods selected for local benefit rather than for global interaction (flexible multi-indexing). We envision rich combinations of such views on application data: views that span backing stores with different semantics, that introduce analytic methods of indexing, and that define multiple views on individual data items. We specifically decline to build a big fused database of everything providing a centralized index over all data, or to export a rigid schema to all comers as in federated query approaches. Second, we proactively advertise these application-specific views so that they may be programmatically reused and extended (data proactivity). Through this mechanism, both changes in state (new data in existing view collected) and changes in structure (new or derived view exists) are made known. Lastly, we embrace found data heterogeneity by coupling multi-indexing to backing stores with appropriate semantics (as opposed to a single store or schema).

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1323278
Report Number(s):
SAND2014-16250; 533748
Country of Publication:
United States
Language:
English

Similar Records

Flexible and Scalable Data Fusion using Proactive, Schemaless Information Services
Technical Report · Thu May 01 00:00:00 EDT 2014 · OSTI ID:1323278

Proactive Data Containers for Scientific Storage (Final Report)
Technical Report · Tue Dec 10 00:00:00 EST 2019 · OSTI ID:1323278

The Materials Data Facility: Data Services to Advance Materials Science Research
Journal Article · Wed Jul 06 00:00:00 EDT 2016 · JOM. Journal of the Minerals, Metals & Materials Society · OSTI ID:1323278