Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Exascale workflow applications and middleware: An ExaWorks retrospective

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [2];  [2];  [3];  [8];  [9]
  1. Rutgers Univ., New Brunswick, NJ (United States)
  2. Argonne National Laboratory (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
  3. Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
  4. Incomputable LLC, Highland Park, NJ (United States)
  5. Brookhaven National Laboratory (BNL), Upton, NY (United States)
  6. Rutgers Univ., New Brunswick, NJ (United States); Brookhaven National Laboratory (BNL), Upton, NY (United States)
  7. Argonne National Laboratory (ANL), Argonne, IL (United States)
  8. Rutgers Univ., New Brunswick, NJ (United States); Princeton Plasma Physics Laboratory (PPPL), Princeton, NJ (United States); Princeton Univ., NJ (United States)
  9. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. Here, we present the ExaWorks project, which addresses many of these challenges. We developed a workflow Software Development Toolkit (SDK), a curated collection of workflow technologies that can be composed and interoperated through a common interface, engineered following current best practices, and specifically designed to work on HPC platforms. ExaWorks also developed PSI/J, a job management abstraction API, to simplify the construction of portable software components and applications that can be used over various HPC schedulers. The PSI/J API is a minimal interface for submitting and monitoring jobs and their execution state across multiple and commonly used HPC schedulers. We also describe several leading and innovative workflow examples of ExaWorks tools used on DOE leadership platforms. Furthermore, we discuss how our project is working with the workflow community, large computing facilities, and HPC platform vendors to address the requirements of workflows sustainably at the exascale.
Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Princeton Plasma Physics Laboratory (PPPL), Princeton, NJ (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-06CH11357; AC05-00OR22725; AC52-07NA27344; SC0012704
OSTI ID:
2573590
Alternate ID(s):
OSTI ID: 2563004
OSTI ID: 2572818
OSTI ID: 2586620
Report Number(s):
BNL--228414-2025-JAAM
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 4 Vol. 39; ISSN 1741-2846; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (23)

The INTERSECT Open Federated Architecture for the Laboratory of the Future
  • Engelmann, Christian; Kuchar, Olga; Boehm, Swen
  • Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation, p. 173-190 https://doi.org/10.1007/978-3-031-23606-8_11
book January 2023
What Is the Price of Simplicity? book January 2010
Flux: Overcoming scheduling challenges for exascale workflows journal September 2020
Using Machine Learning at scale in numerical simulations with SmartSim: An application to ocean climate modeling journal July 2022
Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing
  • Wozniak, J. M.; Armstrong, T. G.; Wilde, M.
  • 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing https://doi.org/10.1109/CCGrid.2013.99
conference May 2013
Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks conference August 2016
Flux: A Next-Generation Resource Management Framework for Large HPC Centers
  • Ahn, Dong H.; Garlick, Jim; Grondona, Mark
  • 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops https://doi.org/10.1109/ICPPW.2014.15
conference September 2014
Frontiers in Scientific Workflows: Pervasive Integration With High-Performance Computing journal August 2024
Middleware Building Blocks for Workflow Systems journal July 2019
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing conference November 2021
Design and Performance Characterization of RADICAL-Pilot on Leadership-Class Platforms journal April 2022
ExaWorks: Workflows for Exascale conference November 2021
A Community Roadmap for Scientific Workflows Research and Development conference November 2021
RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms conference November 2022
PSI/J: A Portable Interface for Submitting, Monitoring, and Managing Jobs conference October 2023
A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer
  • Di Natale, Francesco; Bhatia, Harsh; Carpenter, Timothy S.
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356197
conference November 2019
Parsl: Pervasive Parallel Programming in Python
  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325400
conference January 2019
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads conference October 2021
ExaAM: Metal additive manufacturing simulation at the fidelity of the microstructure journal January 2022
GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics journal October 2023
Employing artificial intelligence to steer exascale workflows with colmena journal October 2024
Workflows Community Summit 2022: A Roadmap Revolution report March 2023
Workflows Community Summit 2024: Future Trends and Challenges in Scientific Workflows report October 2024

Similar Records

ExaWorks: Workflows for Exascale
Conference · Sun Nov 14 23:00:00 EST 2021 · OSTI ID:1880770

Related Subjects