Provenance infrastructure for multimodal x-ray experiments and reproducible analysis
- Brookhaven National Lab. (BNL), Upton, NY (United States). Computational Science Center
- Brookhaven National Lab. (BNL), Upton, NY (United States). National Synchrotron Light Source II (NSLS-II)
- Columbia Univ., New York, NY (United States). Dept. of Applied Physics and Applied Mathematics
Exploring relationships between material structures and desired properties for material discovery and synthesis requires mining machine-readable databases and linking experimental data with sufficiently rich annotations. There are vast gaps both in these data sources and in the connections between them. We present a provenance-based data management and analysis framework that enables capturing, persisting, and reanalyzing experimental data from the BNL National Synchrotron Light Source II (NSLS-II). Our system leverages NSLS-II Bluesky for experimental data acquisition, captures analysis parameters, and enriches the search space with results from the scientific literature. We describe in detail the challenges and benefits of data-driven approaches when comparing computational and experimental data, such as the multiple possible scenarios of interest in often ill-posed inverse problems and missing data. Our infrastructure utilizes the NSLS-II event model and includes discovery mechanisms across multiple beamlines. A text and data mining module using natural language processing classification techniques extracts relevant information from the literature that is added to data sources. We describe a provenance acquisition and analysis rerun workflow based on a directed acyclic graph that keeps track of process and data provenance and allows to replay analysis.
- Research Organization:
- Brookhaven National Lab. (BNL), Upton, NY (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
- DOE Contract Number:
- SC0012704
- OSTI ID:
- 1608436
- Report Number(s):
- BNL-213790-2020-BOOK
- Country of Publication:
- United States
- Language:
- English
Similar Records
Next generation experimental data access at NSLS-II
A Python Instrument Control and Data Acquisition Suite for Reproducible Research