Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

FireWorks: a Dynamic Workflow System Designed for High-Throughput Applications

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.3505· OSTI ID:1474899
 [1];  [2];  [1];  [3];  [1];  [1];  [3];  [4];  [4];  [4];  [3];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Environmental Energy and Technologies Division
  2. Univ. of California, San Diego, CA (United States). Dept. of Nanoengineering
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
  4. Catholic Univ. of Louvain, Louvain-la-Neuve (Belgium). Inst. of Condensed Matter and Nanosciences (IMCN) and European Theoretical Spectroscopy Facility (ETSF)
This work introduces FireWorks, a workflow software for running high-throughput calculation workflows at supercomputing centers. FireWorks has been used to complete over 50 million CPU-hours worth of computational chemistry and materials science calculations at the National Energy Research Supercomputing Center. It has been designed to serve the demanding high-throughput computing needs of these applications, with extensive support for (i) concurrent execution through job packing, (ii) failure detection and correction, (iii) provenance and reporting for long-running projects, (iv) automated duplicate detection, and (v) dynamic workflows (i.e., modifying the workflow graph during runtime). We have found that these features are highly relevant to enabling modern data-driven and high-throughput science applications, and we discuss our implementation strategy that rests on Python and NoSQL databases (MongoDB). Finally, we present performance data and limitations of our approach along with planned future work.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
European Union (EU); USDOE Office of Energy Efficiency and Renewable Energy (EERE); USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22). Joint Center for Energy Storage Research (JCESR); USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1474899
Alternate ID(s):
OSTI ID: 1401389
Journal Information:
Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 17 Vol. 27; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English

References (32)

User tools and languages for graph-based Grid workflows journal January 2006
Adaptive exception handling for scientific workflows journal January 2009
Distributed computing in practice: the Condor experience
  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938
journal January 2005
Scientific workflow management and the Kepler system
  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994
journal January 2006
Workflow Management in Condor book January 2007
Python Scripting for Computational Science book January 2008
Visual Grid Workflow in Triana journal September 2005
Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis journal February 2013
The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles journal February 2015
ABINIT: First-principles approach to material and nanosystem properties journal December 2009
Swift: A language for distributed parallel scripting journal September 2011
High-Throughput ab Initio Screening for Two-Dimensional Electride Materials journal September 2014
Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening journal January 2015
The rechargeable revolution: A better battery journal March 2014
Charting the complete elastic properties of inorganic crystalline compounds journal March 2015
Advances in methods and algorithms in a modern quantum chemistry program package journal January 2006
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Taverna: a tool for the composition and enactment of bioinformatics workflows journal June 2004
Scientific workflow systems - can one size fit all? conference December 2008
NEWT: A RESTful service for building High Performance Computing web applications conference November 2010
Python for Scientific Computing journal January 2007
Python: An Ecosystem for Scientific Computing journal March 2011
Many-task computing for grids and supercomputers conference November 2008
Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials Project
  • Gunter, Dan; Cholia, Shreyas; Jain, Anubhav
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.150
conference November 2012
The Trident Scientific Workflow Workbench conference December 2008
COMPUTER SCIENCE: Beyond the Data Deluge journal March 2009
Weaver: integrating distributed computing abstractions into scientific workflows using Python conference January 2010
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005
Workflow Systems for Science: Concepts and Tools journal January 2013
Globus: a Metacomputing Infrastructure Toolkit journal June 1997
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences journal January 2010
Density functional calculations of the enthalpies of formation of rare-earth orthophosphates journal May 2012

Cited By (3)

Convergence and pitfalls of density functional perturbation theory phonons calculations from a high-throughput perspective journal March 2018
Semantic Interoperability and Characterization of Data Provenance in Computational Molecular Engineering journal December 2019
BioExcel-2 Deliverable 2.1 – State of the Art and Initial Roadmap text January 2019

Figures / Tables (15)


Similar Records

Measuring the impact of burst buffers on data-intensive scientific workflows
Journal Article · Sun Jun 16 20:00:00 EDT 2019 · Future Generations Computer Systems · OSTI ID:1603369

A Job Sizing Strategy for High-Throughput Scientific Workflows
Journal Article · Wed Jan 31 23:00:00 EST 2018 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1472078

DYFLOW: A flexible framework for orchestrating scientific workflows on supercomputers
Conference · Sun Aug 01 00:00:00 EDT 2021 · OSTI ID:1827006