Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A Noisy 10GB Provenance Database

Conference ·
OSTI ID:1093572
Provenance of scientific data is a key piece of the metadata record for the data's ongoing discovery and reuse. Provenance collection systems capture provenance on the fly, however, the protocol between application and provenance tool may not be reliable. Consequently, the provenance record can be partial, partitioned, and simply inaccurate. We use a workflow emulator that models faults to construct a large 10GB database of provenance that we know is noisy (that is, has errors). We discuss the process of generating the provenance database, and show early results on the kinds of provenance analysis enabled by the large provenance.
Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
Computational Research Division
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1093572
Report Number(s):
LBNL-5436E
Country of Publication:
United States
Language:
English

Similar Records

Applying Content Management to Automated Provenance Capture
Journal Article · Thu Apr 10 00:00:00 EDT 2008 · Concurrency and Computation. Practice & Experience, 20(5):541-554 · OSTI ID:927710

Automated metadata, provenance cataloging and navigable interfaces: ensuring the usefulness of extreme-scale data
Technical Report · Wed Dec 14 23:00:00 EST 2016 · OSTI ID:1335866

Provenance management in Swift with implementation details.
Technical Report · Fri Apr 01 00:00:00 EDT 2011 · OSTI ID:1011306