skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Accelerating Scientific Workflows on HPC Platforms with In Situ Processing

Conference ·

Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; and (ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1888792
Resource Relation:
Conference: 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, 05/16/22 - 05/19/22, Messina, IT
Country of Publication:
United States
Language:
English

References (24)

A global reference for human genetic variation journal January 2015
SLURM: Simple Linux Utility for Resource Management book January 2003
Performance characterization of scientific workflows for the optimal use of Burst Buffers journal September 2020
In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms journal June 2016
Lessons Learned from Building In Situ Coupling Frameworks
  • Dorier, Matthieu; Dreher, Matthieu; Peterka, Tom
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828622
conference January 2015
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
  • Zhang, Fan; Docan, Ciprian; Parashar, Manish
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.122
conference May 2012
A Survey of Data-Intensive Scientific Workflow Management journal March 2015
A characterization of workflow management systems for extreme-scale applications journal October 2017
ParaView Catalyst: Enabling In Situ Data Analysis and Visualization
  • Ayachit, Utkarsh; Bauer, Andrew; Geveci, Berk
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828624
conference January 2015
Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations journal October 2016
Bredala: Semantic Data Redistribution for In Situ Applications conference September 2016
Exploration of Workflow Management Systems Emerging Features from Users Perspectives conference December 2019
Pegasus, a workflow management system for science automation journal May 2015
Accelerated, scalable and reproducible AI-driven gravitational wave detection journal July 2021
CyberShake: A Physics-Based Seismic Hazard Model for Southern California journal May 2010
Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity report December 2018
The Evolution of the Pegasus Workflow Management Software journal July 2019
Supporting High-Performance and High-Throughput Computing for Experimental Science journal February 2019
DataSpaces: an interaction and coordination framework for coupled simulation workflows journal February 2011
Heterogeneous Hierarchical Workflow Composition journal July 2019
Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics conference May 2014
Distributed computing in practice: the Condor experience
  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938
journal January 2005
PyCOMPSs: Parallel computational workflows in Python journal July 2016
Enabling large-scale scientific workflows on petascale resources using MPI master/worker
  • Rynge, Mats; Callaghan, Scott; Deelman, Ewa
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment on Bridging from the eXtreme to the campus and beyond - XSEDE '12 https://doi.org/10.1145/2335755.2335846
conference January 2012