Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows

Conference · · Workshop on Workflows in Support of Large-Scale Science.

Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1833998
Journal Information:
Workshop on Workflows in Support of Large-Scale Science., Vol. 2021; Conference: 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS) , St. Louis, MO, USA; ISSN 2151-1373
Publisher:
IEEE
Country of Publication:
United States
Language:
English

References (19)

LabelFlow Framework for Annotating Workflow Provenance journal February 2018
iRODS Primer: Integrated Rule-Oriented Data System journal January 2010
Data at work: supporting sharing in science and engineering conference January 2003
If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology journal July 2013
Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication journal July 2020
Open is not enough journal November 2018
Xi-cam : a versatile interface for data visualization and analysis journal May 2018
Provenance and data differencing for workflow reproducibility analysis journal April 2013
FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications journal May 2015
Computing environments for reproducibility: Capturing the “Whole Tale” journal May 2019
ReproZip: Computational Reproducibility With Ease
  • Chirigati, Fernando; Rampin, Rémi; Shasha, Dennis
  • SIGMOD/PODS'16: International Conference on Management of Data, Proceedings of the 2016 International Conference on Management of Data https://doi.org/10.1145/2882903.2899401
conference June 2016
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005
REANA: A System for Reusable Research Data Analyses journal January 2019
An empirical analysis of journal policy effectiveness for computational reproducibility journal March 2018
Temporal representation for mining scientific data provenance journal July 2014
Mining Taverna's semantic web of provenance journal January 2008
Ontologies: principles, methods and applications journal June 1996
PDiffView journal August 2009
The W3C PROV family of specifications for modelling provenance metadata conference March 2013