Prescriptive provenance for streaming analysis of workflows at scale
Abstract- We extend our approach capturing and relating the provenance and performance metrics of computational workflows as a diagnostic tool for runtime optimization and placement. One important challenge is the volume of extracted data, both for performance metrics and provenance, even when specifying filters and focusing on quantities of interest in a simulation. We reduce this data by performing anomaly detection on streaming data and store provenance for the detected anomalies, an approach we call prescriptive provenance. This paper discusses the Chimbuko architecture enabling the approach. We present the use of a protein structure propagation workflow based on NWChemEx. We are testing algorithms for anomaly detection and present preliminary results here obtained with Local Outlier Factor. While scaling remains a challenge, these results show that our robust Chimbuko architecture for streaming analysis with prescriptive provenance is a promising approach.
- Research Organization:
- Brookhaven National Laboratory (BNL), Upton, NY (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
- DOE Contract Number:
- SC0012704
- OSTI ID:
- 1561255
- Report Number(s):
- BNL-212071-2019-COPR
- Country of Publication:
- United States
- Language:
- English
Similar Records
Capturing provenance as a diagnostic tool for workflow performance evaluation and optimization
Computational reproducibility of scientific workflows at extreme scales