skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: FireWorks: a Dynamic Workflow System Designed for High-Throughput Applications

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.3505· OSTI ID:1474899
 [1];  [2];  [1];  [3];  [1];  [1];  [3];  [4];  [4];  [4];  [3];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Environmental Energy and Technologies Division
  2. Univ. of California, San Diego, CA (United States). Dept. of Nanoengineering
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
  4. Catholic Univ. of Louvain, Louvain-la-Neuve (Belgium). Inst. of Condensed Matter and Nanosciences (IMCN) and European Theoretical Spectroscopy Facility (ETSF)

This work introduces FireWorks, a workflow software for running high-throughput calculation workflows at supercomputing centers. FireWorks has been used to complete over 50 million CPU-hours worth of computational chemistry and materials science calculations at the National Energy Research Supercomputing Center. It has been designed to serve the demanding high-throughput computing needs of these applications, with extensive support for (i) concurrent execution through job packing, (ii) failure detection and correction, (iii) provenance and reporting for long-running projects, (iv) automated duplicate detection, and (v) dynamic workflows (i.e., modifying the workflow graph during runtime). We have found that these features are highly relevant to enabling modern data-driven and high-throughput science applications, and we discuss our implementation strategy that rests on Python and NoSQL databases (MongoDB). Finally, we present performance data and limitations of our approach along with planned future work.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Energy Efficiency and Renewable Energy (EERE); USDOE Office of Science (SC), Basic Energy Sciences (BES). Joint Center for Energy Storage Research (JCESR); USDOE Office of Science (SC), Biological and Environmental Research (BER); European Union (EU)
Grant/Contract Number:
AC02-05CH11231; EDCBEE; HTforTCOs PCIG11‐GA‐2012‐321988
OSTI ID:
1474899
Alternate ID(s):
OSTI ID: 1401389
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 27, Issue 17; Related Information: © 2015 John Wiley & Sons, Ltd.; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 287 works
Citation information provided by
Web of Science

References (31)

Visual Grid Workflow in Triana journal September 2005
Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials Project
  • Gunter, Dan; Cholia, Shreyas; Jain, Anubhav
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.150
conference November 2012
Scientific workflow systems - can one size fit all? conference December 2008
Charting the complete elastic properties of inorganic crystalline compounds journal March 2015
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005
Workflow Systems for Science: Concepts and Tools journal January 2013
User tools and languages for graph-based Grid workflows journal January 2006
Taverna: a tool for the composition and enactment of bioinformatics workflows journal June 2004
Advances in methods and algorithms in a modern quantum chemistry program package journal January 2006
Accelerating Electrolyte Discovery for Energy Storage with High-Throughput Screening journal January 2015
Swift: A language for distributed parallel scripting journal September 2011
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles journal February 2015
NEWT: A RESTful service for building High Performance Computing web applications conference November 2010
The Trident Scientific Workflow Workbench conference December 2008
The rechargeable revolution: A better battery journal March 2014
High-Throughput ab Initio Screening for Two-Dimensional Electride Materials journal September 2014
Python Scripting for Computational Science book January 2008
ABINIT: First-principles approach to material and nanosystem properties journal December 2009
Python: An Ecosystem for Scientific Computing journal March 2011
Python for Scientific Computing journal January 2007
Scientific workflow management and the Kepler system
  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994
journal January 2006
Density functional calculations of the enthalpies of formation of rare-earth orthophosphates journal May 2012
Distributed computing in practice: the Condor experience
  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938
journal January 2005
COMPUTER SCIENCE: Beyond the Data Deluge journal March 2009
Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis journal February 2013
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences journal January 2010
Globus: a Metacomputing Infrastructure Toolkit journal June 1997
Many-task computing for grids and supercomputers conference November 2008
Weaver: integrating distributed computing abstractions into scientific workflows using Python conference January 2010
Adaptive exception handling for scientific workflows journal January 2009

Cited By (3)

BioExcel-2 Deliverable 2.1 – State of the Art and Initial Roadmap text January 2019
Convergence and pitfalls of density functional perturbation theory phonons calculations from a high-throughput perspective journal March 2018
Semantic Interoperability and Characterization of Data Provenance in Computational Molecular Engineering journal December 2019

Figures / Tables (15)