skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Scientific Data Provenance API for Distributed Applications

Conference ·

Data provenance has been an active area of research as a means to standardize how the origin of data, process event history, and what or who was responsible for influencing results is explained. There are two approaches to capture provenance information. The first approach is to collect observed evidence produced by an executing application using log files, event listeners, and temporary files that are used by the application or application developer. The provenance translated from these observations is an interpretation of the provided evidence. The second approach is called disclosed because the application provides a firsthand account of the provenance based on the anticipated questions on data flow, process flow, and responsible agents. Most observed provenance collection systems collect lot of provenance information during an application run or workflow execution. The common trend in capturing provenance is to collect all possible information, then attempt to find relevant information, which is not efficient. Existing disclosed provenance system APIs do not work well in distributed environment and have trouble finding where to fit the individual pieces of provenance information. This work focuses on determining more reliable solutions for provenance capture. As part of the Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project, an API was developed, called Producer API (PAPI), which can disclose application targeted provenance, designed to work in distributed environments by means of unique object identification methods. The provenance disclosure approach used adds additional metadata to the provenance information to uniquely identify the pieces and connect them together. PAPI uses a common provenance model to support this provenance integration across disclosure sources. The API also provides the flexibility to let the user decide what to do with the collected provenance. The collected provenance can be sent to a triple store using REST services or it can be logged to a file.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1440706
Report Number(s):
PNNL-SA-119915; KJ0404000
Resource Relation:
Conference: International Conference on Collaboration Technologies and Systems (CTS 2016), October 31-November 4, 2016, Orlando, Florida, 104-111
Country of Publication:
United States
Language:
English

Related Subjects