Exascale workflow applications and middleware: An ExaWorks retrospective
Journal Article
·
· International Journal of High Performance Computing Applications
- Rutgers Univ., New Brunswick, NJ (United States)
- Argonne National Laboratory (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Incomputable LLC, Highland Park, NJ (United States)
- Brookhaven National Laboratory (BNL), Upton, NY (United States)
- Rutgers Univ., New Brunswick, NJ (United States); Brookhaven National Laboratory (BNL), Upton, NY (United States)
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Rutgers Univ., New Brunswick, NJ (United States); Princeton Plasma Physics Laboratory (PPPL), Princeton, NJ (United States); Princeton Univ., NJ (United States)
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. Here, we present the ExaWorks project, which addresses many of these challenges. We developed a workflow Software Development Toolkit (SDK), a curated collection of workflow technologies that can be composed and interoperated through a common interface, engineered following current best practices, and specifically designed to work on HPC platforms. ExaWorks also developed PSI/J, a job management abstraction API, to simplify the construction of portable software components and applications that can be used over various HPC schedulers. The PSI/J API is a minimal interface for submitting and monitoring jobs and their execution state across multiple and commonly used HPC schedulers. We also describe several leading and innovative workflow examples of ExaWorks tools used on DOE leadership platforms. Furthermore, we discuss how our project is working with the workflow community, large computing facilities, and HPC platform vendors to address the requirements of workflows sustainably at the exascale.
- Research Organization:
- Brookhaven National Laboratory (BNL), Upton, NY (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Princeton Plasma Physics Laboratory (PPPL), Princeton, NJ (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-06CH11357; AC05-00OR22725; AC52-07NA27344; SC0012704
- OSTI ID:
- 2573590
- Alternate ID(s):
- OSTI ID: 2563004
OSTI ID: 2572818
OSTI ID: 2586620
- Report Number(s):
- BNL--228414-2025-JAAM
- Journal Information:
- International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 4 Vol. 39; ISSN 1741-2846; ISSN 1094-3420
- Publisher:
- SAGECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
ExaWorks: Workflows for Exascale
Conference
·
Sun Nov 14 23:00:00 EST 2021
·
OSTI ID:1880770