Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Performance Characterization and Provenance of Distributed Task-based Workflows on HPC Platforms

Conference ·
Understanding performance and provenance of task-based workflows poses significant challenges, particularly in distributed configurations where resources are shared by multiple applications. Task-based workflow management systems further complicate performance predictability because of their dynamicity that subtly alters task execution order from run to run. In this paper we propose a layered characterization framework for performance and task provenance for Dask.distributed workflows running on high-performance computing (HPC) platforms. It collects data from jobs, the workflow management system, and the operating system to aid in understanding the performance of these workflows. Our approach encompasses three main contributions: first, an extension of Dask.distributed to capture high-fidelity task provenance using Mochi data services; second, the adaptation of the established HPC I/O characterization tool Darshan to gather high-fidelity I/O data, thereby enhancing the granularity of our analysis; and third, a framework to combine and process the collected data and provide helpful insights into performance characterization and reproducibility, alongside our lessons learned.
Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
US Department of Energy; USDOE Office of Science; USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
2588773
Country of Publication:
United States
Language:
English