Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Prescriptive provenance for streaming analysis of workflows at scale

Conference ·

Abstract- We extend our approach capturing and relating the provenance and performance metrics of computational workflows as a diagnostic tool for runtime optimization and placement. One important challenge is the volume of extracted data, both for performance metrics and provenance, even when specifying filters and focusing on quantities of interest in a simulation. We reduce this data by performing anomaly detection on streaming data and store provenance for the detected anomalies, an approach we call prescriptive provenance. This paper discusses the Chimbuko architecture enabling the approach. We present the use of a protein structure propagation workflow based on NWChemEx. We are testing algorithms for anomaly detection and present preliminary results here obtained with Local Outlier Factor. While scaling remains a challenge, these results show that our robust Chimbuko architecture for streaming analysis with prescriptive provenance is a promising approach.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
DOE Contract Number:
SC0012704
OSTI ID:
1561255
Report Number(s):
BNL-212071-2019-COPR
Country of Publication:
United States
Language:
English

Similar Records

Capturing provenance as a diagnostic tool for workflow performance evaluation and optimization
Conference · Sun Aug 06 00:00:00 EDT 2017 · OSTI ID:1619260

Capturing provenance as a diagnostic tool for workflow performance evaluation and optimization
Conference · Wed Aug 30 00:00:00 EDT 2017 · Proceedings · OSTI ID:1556918

Computational reproducibility of scientific workflows at extreme scales
Journal Article · Mon Apr 08 00:00:00 EDT 2019 · International Journal of High Performance Computing Applications · OSTI ID:1542776

Related Subjects