A Noisy 10GB Provenance Database
Conference
·
OSTI ID:1093572
Provenance of scientific data is a key piece of the metadata record for the data's ongoing discovery and reuse. Provenance collection systems capture provenance on the fly, however, the protocol between application and provenance tool may not be reliable. Consequently, the provenance record can be partial, partitioned, and simply inaccurate. We use a workflow emulator that models faults to construct a large 10GB database of provenance that we know is noisy (that is, has errors). We discuss the process of generating the provenance database, and show early results on the kinds of provenance analysis enabled by the large provenance.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- Computational Research Division
- DOE Contract Number:
- DE-AC02-05CH11231
- OSTI ID:
- 1093572
- Report Number(s):
- LBNL-5436E
- Resource Relation:
- Conference: Second International Workshop on Traceability and Compliance of Semi-Structured Processes ((TC4SP2011), co-located with Business Process Management (BPM 2011), Clermont-Ferrand, France
- Country of Publication:
- United States
- Language:
- English
Similar Records
Provenance In Sensor Data Management: A Cohesive, Independent Solution
Applying Content Management to Automated Provenance Capture
Automated metadata, provenance cataloging and navigable interfaces: ensuring the usefulness of extreme-scale data
Journal Article
·
Wed Jan 01 00:00:00 EST 2014
· Communications of the ACM
·
OSTI ID:1093572
Applying Content Management to Automated Provenance Capture
Journal Article
·
Thu Apr 10 00:00:00 EDT 2008
· Concurrency and Computation. Practice & Experience, 20(5):541-554
·
OSTI ID:1093572
+1 more
Automated metadata, provenance cataloging and navigable interfaces: ensuring the usefulness of extreme-scale data
Technical Report
·
Thu Dec 15 00:00:00 EST 2016
·
OSTI ID:1093572