skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ExaWorks: Workflows for Exascale

Conference · · 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS)
 [1];  [2];  [3];  [3];  [2];  [3];  [2];  [4];  [2];  [5];  [6];  [6]; ORCiD logo [5];  [4];  [6];  [6]
  1. Rutgers Univ., Piscataway, NJ (United States)
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
  4. Rutgers Univ., Piscataway, NJ (United States); Brookhaven National Lab. (BNL), Upton, NY (United States)
  5. Brookhaven National Lab. (BNL), Upton, NY (United States)
  6. Argonne National Lab. (ANL), Argonne, IL (United States)

Exascale computers will offer transformative capa- bilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integra- tions, however, are difficult to achieve due to challenges of coor- dination and deployment of heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which can address many of these challenges: ExaWorks is leading a co-design process to create a workflow Software Development Toolkit (SDK) consisting of a wide range of work- flow management tools that can be composed and interoperate through common interfaces. We describe the initial set of tools and interfaces supported by the SDK, efforts to make them eas- ier to apply to complex science challenges, and examples of their application to exemplar cases. Furthermore, we discuss how our project is working with the workflows community, large com- puting facilities as well as HPC platform vendors to sustainably address the requirements of workflows at the exascale.

Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
DOE Contract Number:
SC0012704; AC52-07NA27344; AC02-06CH11357
OSTI ID:
1863883
Report Number(s):
BNL-222941-2022-JAAM
Journal Information:
2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), Conference: 16th.Workshop on Workflows in Support of Large-Scale Science, St. Louis, MO (United States), 15 Nov 2021
Country of Publication:
United States
Language:
English

References (13)

Dataflow coordination of data-parallel tasks via MPI 3.0 conference January 2013
High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response
  • Wozniak, Justin M.; Yoo, Hyunseung; Mohd-Yusof, Jamaludin
  • 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) https://doi.org/10.1109/MLHPCAI4S51975.2020.00012
conference November 2020
The Exascale Computing Project journal May 2017
funcX: A Federated Function Serving Fabric for Science
  • Chard, Ryan; Babuji, Yadu; Li, Zhuozhao
  • HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3369583.3392683
conference June 2020
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research journal December 2018
Parsl: Pervasive Parallel Programming in Python
  • Babuji, Yadu; Foster, Ian; Wilde, Michael
  • Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 https://doi.org/10.1145/3307681.3325400
conference January 2019
Scalable HPC & AI infrastructure for COVID-19 therapeutics
  • Lee, Hyungro; Merzky, Andre; Tan, Li
  • PASC '21: Platform for Advanced Scientific Computing Conference, Proceedings of the Platform for Advanced Scientific Computing Conference https://doi.org/10.1145/3468267.3470573
conference July 2021
Compiler Techniques for Massively Scalable Implicit Task Parallelism
  • Armstrong, Timothy G.; Wozniak, Justin M.; Wilde, Michael
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.30
conference November 2014
SLURM: Simple Linux Utility for Resource Management book January 2003
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications journal January 2013
Generalizable coordination of large multiscale workflows: challenges and learnings at scale
  • Bhatia, Harsh; Di Natale, Francesco; Moon, Joseph Y.
  • SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476210
conference November 2021
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing conference November 2021
A population data-driven workflow for COVID-19 modeling and learning journal September 2021