Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

ExaWorks: Workflows for Exascale

Conference ·

Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which can address many of these challenges: ExaWorks is leading a co-design process to create a workflow Software Development Toolkit (SDK) consisting of a wide range of workflow management tools that can be composed and interoperate through common interfaces. We describe the initial set of tools and interfaces supported by the SDK, efforts to make them easier to apply to complex science challenges, and examples of their application to exemplar cases. Furthermore, we discuss how our project is working with the workflows community, large computing facilities as well as HPC platform vendors to sustainably address the requirements of workflows at the exascale.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (SC-21)
DOE Contract Number:
SC0012704
OSTI ID:
1880770
Report Number(s):
BNL-223247-2022-CPPJ
Resource Relation:
Conference: WORKS21: 16th Workshop on Workflows in Support of Large-Scale Science held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, 11/15/2021 - 11/15/2021
Country of Publication:
United States
Language:
English

References (14)

CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research journal December 2018
funcX: A Federated Function Serving Fabric for Science
  • Chard, Ryan; Babuji, Yadu; Li, Zhuozhao
  • HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3369583.3392683
conference June 2020
Design and Performance Characterization of RADICAL-Pilot on Leadership-Class Platforms journal April 2022
A population data-driven workflow for COVID-19 modeling and learning journal September 2021
High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response
  • Wozniak, Justin M.; Yoo, Hyunseung; Mohd-Yusof, Jamaludin
  • 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) https://doi.org/10.1109/MLHPCAI4S51975.2020.00012
conference November 2020
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications journal January 2013
Compiler Techniques for Massively Scalable Implicit Task Parallelism
  • Armstrong, Timothy G.; Wozniak, Justin M.; Wilde, Michael
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.30
conference November 2014
Dataflow coordination of data-parallel tasks via MPI 3.0 conference January 2013
Generalizable coordination of large multiscale workflows: challenges and learnings at scale
  • Bhatia, Harsh; Di Natale, Francesco; Moon, Joseph Y.
  • SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476210
conference November 2021
Parsl: Pervasive Parallel Programming in Python
  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325400
conference January 2019
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing conference November 2021
The Exascale Computing Project journal May 2017
SLURM: Simple Linux Utility for Resource Management book January 2003
Scalable HPC & AI infrastructure for COVID-19 therapeutics
  • Lee, Hyungro; Merzky, Andre; Tan, Li
  • PASC '21: Platform for Advanced Scientific Computing Conference, Proceedings of the Platform for Advanced Scientific Computing Conference https://doi.org/10.1145/3468267.3470573
conference July 2021

Similar Records

ExaWorks: Workflows for Exascale
Conference · 2021 · 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS) · OSTI ID:1863883

Exascale workflow applications and middleware: An ExaWorks retrospective
Journal Article · 2025 · International Journal of High Performance Computing Applications · OSTI ID:2563004

ExaWorks software development kit: a robust and scalable collection of interoperable workflows technologies
Journal Article · 2024 · Frontiers in High Performance Computing · OSTI ID:2476608

Related Subjects