skip to main content

Title: Provenance In Sensor Data Management: A Cohesive, Independent Solution

In today's information-driven workplaces, data is constantly undergoing transformations and being moved around. The typical business-as-usual approach is to use email attachments, shared network locations, databases, and now, the cloud. More often than not, there are multiple versions of the data sitting in different locations and users of this data are confounded by the lack of metadata describing its provenance, or in other words, its lineage. Our project is aimed to solve this issue in the context of sensor data. The Oak Ridge National Laboratory's Building Technologies Research and Integration Center has reconfigurable commercial buildings deployed on the Flexible Research Platforms (FRPs). These FRPs are instrumented with a large number of sensors which measure a number of variables such as HVAC efficiency, relative humidity, and temperature gradients across doors, windows, and walls. Sub-minute resolution data from hundreds of channels is acquired. This sensor data, traditionally, was saved to a shared network location which was accessible to a number of scientists for performing complicated simulation and analysis tasks. The sensor data also participates in elaborate quality assurance exercises as a result of inherent faults. Sometimes, faults are induced to observe building behavior. It became apparent that proper scientific controls required notmore » just managing the data acquisition and delivery, but to also manage the metadata associated with temporal subsets of the sensor data. We built a system named ProvDMS, or Provenance Data Management System for the FRPs, which would both allow researchers to retrieve data of interest as well as trace data lineage. This provides researchers a one-stop shop for comprehensive views of various data transformation allowing researchers to effectively trace their data to its source so that experiments, and derivations of experiments, may be reused and reproduced without much overhead of the repeatability of experiments that use it. Using these traces, researchers can determine exactly what happens to data as it moves through its life cycle.« less
 [1] ;  [1] ;  [1]
  1. ORNL
Publication Date:
OSTI Identifier:
DOE Contract Number:
Resource Type:
Journal Article
Resource Relation:
Journal Name: Communications of the ACM; Journal Volume: 57; Journal Issue: 2
Research Org:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Building Technologies Research and Integration Center
Sponsoring Org:
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
Country of Publication:
United States