DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A characterization of workflow management systems for extreme-scale applications

Journal Article · · Future Generations Computer Systems
 [1];  [2];  [3];  [4];  [5];  [1]
  1. University of Southern California, Marina del Rey, CA (United States)
  2. British Geological Survey, Lyell Centre, Edinburgh (United Kingdom); University of Edinburgh (United Kingdom). School of Informatics
  3. Univ. of Athens (Greece). Department of Informatics and Telecommunication
  4. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  5. Univ. of Manchester (United Kingdom). School of Computer Science

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC52-07NA27344; SC0012636; AC52-07NA27344 (LLNL-JRNL-706700); 16-ERD-036; DESC0012636
OSTI ID:
1408072
Alternate ID(s):
OSTI ID: 1495625
Report Number(s):
LLNL-JRNL-706700
Journal Information:
Future Generations Computer Systems, Vol. 75, Issue C; ISSN 0167-739X
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (51)

MapReduce: simplified data processing on large clusters journal January 2008
Self-scalable services in service oriented software for cost-effective data farming journal January 2016
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
  • Lofstead, Jay F.; Klasky, Scott; Schwan, Karsten
  • Proceedings of the 6th international workshop on Challenges of large applications in distributed environments - CLADE '08 https://doi.org/10.1145/1383529.1383533
conference January 2008
A Supervised Learning Framework for Arbitrary Lagrangian-Eulerian Simulations conference December 2016
FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications journal May 2015
Taverna: a tool for the composition and enactment of bioinformatics workflows journal June 2004
A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations
  • Esnard, Aurelien; Richart, Nicolas; Coulaud, Olivier
  • Proceedings. Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications, 2006 Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications https://doi.org/10.1109/DS-RT.2006.7
conference October 2006
Swift: A language for distributed parallel scripting journal September 2011
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows journal July 2016
DataSpaces: an interaction and coordination framework for coupled simulation workflows conference January 2010
ParaTrac: a fine-grained profiler for data-intensive workflows conference January 2010
A Survey of Data-Intensive Scientific Workflow Management journal March 2015
Grid Computing Workloads journal March 2011
Characterizing and profiling scientific workflows journal March 2013
Workflows and e-Science: An overview of workflow system features and capabilities journal May 2009
Science automation in practice: Performance data farming in workflows conference September 2016
The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud journal May 2013
Taverna: a tool for building and running workflows of services journal July 2006
dispel4py: An Agile Framework for Data-Intensive eScience conference August 2015
Characterization of scientific workflows conference November 2008
Apache airavata: a framework for distributed applications and computational workflows conference January 2011
Bobolang: a language for parallel streaming applications
  • Falt, Zbyněk; Bednárek, David; Kruliš, Martin
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14 https://doi.org/10.1145/2600212.2600711
conference January 2014
The ParaView Coprocessing Library: A scalable, general purpose in situ visualization library conference October 2011
Uncertainty Quantification in Computational Predictive Models for Fluid Dynamics Using a Workflow Management Engine journal January 2012
Adaptable, metadata rich IO methods for portable high performance IO conference May 2009
The Hadoop Distributed File System conference May 2010
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids
  • Albrecht, Michael; Donnelly, Patrick; Bui, Peter
  • Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies - SWEET '12 https://doi.org/10.1145/2443416.2443417
conference January 2012
Scientific workload characterization by loop-based analyses journal February 1992
The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research journal November 2006
Exascale computing and big data journal June 2015
Pegasus, a workflow management system for science automation journal May 2015
Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists journal January 2010
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
  • Zhang, Fan; Docan, Ciprian; Parashar, Manish
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.122
conference May 2012
Developing a Learning Algorithm-Generated Empirical Relaxer report March 2016
Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds journal July 2015
Practical Resource Monitoring for Robust High Throughput Computing conference September 2015
Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis journal January 2010
Toward simulation-time data analysis and I/O acceleration on leadership-class systems conference October 2011
Using a suite of ontologies for preserving workflow-centric research objects journal May 2015
ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data
  • Lakshminarasimhan, Sriram; Klasky, Scott; Latham, Robert
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063425
conference January 2011
A Pipeline Virtual Service Pre-Scheduling Pattern and its Application in Astronomy Data Processing journal January 2007
A Taxonomy of Workflow Management Systems for Grid Computing journal September 2005
In Situ Visualization at Extreme Scale: Challenges and Opportunities journal November 2009
In Situ Visualization for Large-Scale Combustion Simulations journal May 2010
SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows
  • de Oliveira, Daniel; Ogasawara, Eduardo; Baião, Fernanda
  • 2010 IEEE International Conference on Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on Cloud Computing https://doi.org/10.1109/CLOUD.2010.64
conference July 2010
Concurrent Visualization in a Production Supercomputing Environment journal September 2006
Provenance for Visualizations: Reproducibility and Beyond journal September 2007
Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR journal August 2008
Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach journal January 2015
Online Task Resource Consumption Prediction for Scientific Workflows journal September 2015
Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science conference November 2016

Cited By (13)

HPC Application Cloudification: The StreamFlow Toolkit (Invited Paper) text January 2021
Bringing AI pipelines onto cloud-HPC: setting a baseline for accuracy of COVID-19 diagnosis text January 2021
Workflow provenance in the lifecycle of scientific machine learning
  • Souza, Renan; Azevedo, Leonardo G.; Lourenço, Vítor
  • Concurrency and Computation: Practice and Experience, Vol. 34, Issue 14 https://doi.org/10.1002/cpe.6544
journal August 2021
Managing genomic variant calling workflows with Swift/T journal July 2019
Scientific workflows: Past, present and future journal October 2017
Exploiting Spark for HPC Simulation Data: Taming the Ephemeral Data Explosion
  • Jiang, Ming; Gallagher, Brian; Chu, Albert
  • HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region https://doi.org/10.1145/3368474.3368482
conference January 2020
Managing genomic variant calling workflows with Swift/T journal January 2019
Highly Interactive, Steered Scientific Workflows on HPC Systems: Optimizing Design Solutions
  • Ossyra, John R.; Sedova, Ada; Baker, Matthew B.
  • High Performance Computing: ISC High Performance 2019 International Workshops, Frankfurt, Germany, June 16-20, 2019, Revised Selected Papers, p. 514-527 https://doi.org/10.1007/978-3-030-34356-9_39
book December 2019
Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data journal January 2017
Ranking open source application integration frameworks based on maintainability metrics: A review of five‐year evolution journal July 2019
BIGGR: Bringing Gradoop to Applications journal February 2019
Scientific workflows applied to the coupling of a continuum (Elmer v8.3) and a discrete element (HiDEM v1.0) ice dynamic model journal January 2019
Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data journal January 2017