Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Accelerating Scientific Workflows on HPC Platforms with In Situ Processing

Conference ·

Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; and (ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks.

Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1888792
Country of Publication:
United States
Language:
English

References (24)

DataSpaces: an interaction and coordination framework for coupled simulation workflows journal February 2011
Heterogeneous Hierarchical Workflow Composition journal July 2019
A Survey of Data-Intensive Scientific Workflow Management journal March 2015
Supporting High-Performance and High-Throughput Computing for Experimental Science journal February 2019
PyCOMPSs: Parallel computational workflows in Python journal July 2016
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
  • Zhang, Fan; Docan, Ciprian; Parashar, Manish
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.122
conference May 2012
CyberShake: A Physics-Based Seismic Hazard Model for Southern California journal May 2010
Pegasus, a workflow management system for science automation journal May 2015
SLURM: Simple Linux Utility for Resource Management book January 2003
A global reference for human genetic variation journal January 2015
Performance characterization of scientific workflows for the optimal use of Burst Buffers journal September 2020
Enabling large-scale scientific workflows on petascale resources using MPI master/worker
  • Rynge, Mats; Callaghan, Scott; Deelman, Ewa
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12 https://doi.org/10.1145/2335755.2335846
conference January 2012
Bredala: Semantic Data Redistribution for In Situ Applications conference September 2016
A characterization of workflow management systems for extreme-scale applications journal October 2017
In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms journal June 2016
Accelerated, scalable and reproducible AI-driven gravitational wave detection journal July 2021
Distributed computing in practice: the Condor experience
  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938
journal January 2005
Exploration of Workflow Management Systems Emerging Features from Users Perspectives conference December 2019
The Evolution of the Pegasus Workflow Management Software journal July 2019
Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics conference May 2014
Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity report December 2018
ParaView Catalyst: Enabling In Situ Data Analysis and Visualization
  • Ayachit, Utkarsh; Bauer, Andrew; Geveci, Berk
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828624
conference January 2015
Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations journal October 2016
Lessons Learned from Building In Situ Coupling Frameworks
  • Dorier, Matthieu; Dreher, Matthieu; Peterka, Tom
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828622
conference January 2015

Similar Records

Decaf: Decoupled Dataflows for In Situ High-Performance Workflows
Technical Report · Mon Jul 31 00:00:00 EDT 2017 · OSTI ID:1372113

Enabling HPC Scientific Workflows for Serverless
Conference · Fri Nov 01 00:00:00 EDT 2024 · OSTI ID:2538241