Applying Content Management to Automated Provenance Capture
Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow tools for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 927710
- Report Number(s):
- PNNL-SA-52935; TRN: US200816%%1171
- Journal Information:
- Concurrency and Computation. Practice & Experience, 20(5):541-554, Vol. 20, Issue 5
- Country of Publication:
- United States
- Language:
- English
Similar Records
Adapting the Electronic Laboratory Notebook for the Semantic Era
The MPO system for automatic workflow documentation