Accelerating Scientific Workflows on HPC Platforms with In Situ Processing
Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; and (ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks.
- Research Organization:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1888792
- Resource Relation:
- Conference: 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, 05/16/22 - 05/19/22, Messina, IT
- Country of Publication:
- United States
- Language:
- English
Similar Records
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows
Enabling HPC Scientific Workflows for Serverless