Modular performance prediction for scientific workflows using Machine Learning
Journal Article
·
· Future Generations Computer Systems
- Univ. of California, San Diego, La Jolla, CA (United States). San Diego Supercomputer Center; Univ. of California, San Diego, La Jolla, CA (United States)
- Univ. of California, San Diego, La Jolla, CA (United States). San Diego Supercomputer Center
Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation. In this paper, we present a modular framework that leverages Machine Learning for creating precise performance predictions of a workflow. The central idea is to partition a workflow in such a way that makes the task of forecasting each atomic unit manageable and gives us a way to combine the individual predictions efficiently. We recognize a combination of an executable and a specific physical resource as a single module. This gives us a handle to characterize workload and machine power as a single unit of prediction. Overall, our modular technique of creating atomic modules and deployment of longest-path approach to estimate workflow performance, allows the framework to adapt to highly complex nested directed acyclic workflows and scale to new scenarios, since it does not make assumptions of underlying workflow structure. We present performance estimation results of independent workflow modules executed on the XSEDE SDSC Comet cluster using various Machine Learning algorithms. The results provide insights into the behavior and effectiveness of different algorithms in the context of scientific workflow performance prediction.
- Research Organization:
- Univ. of California, San Diego, CA (United States)
- Sponsoring Organization:
- National Institutes of Health (NIH); National Science Foundation (NSF); USDOE; USDOE Office of Science (SC)
- Grant/Contract Number:
- SC0012630
- OSTI ID:
- 1851724
- Alternate ID(s):
- OSTI ID: 1776457
- Journal Information:
- Future Generations Computer Systems, Journal Name: Future Generations Computer Systems Journal Issue: C Vol. 114; ISSN 0167-739X
- Publisher:
- ElsevierCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
iDDS: intelligent distributed dispatch and scheduling for workflow orchestration
Asynchronous Execution of Heterogeneous Tasks in ML-Driven HPC Workflows
Integration of scanning probe microscope with high-performance computing: Fixed-policy and reward-driven workflows implementation
Journal Article
·
Fri Jan 23 19:00:00 EST 2026
· European Physical Journal. C, Particles and Fields (Online)
·
OSTI ID:3017617
Asynchronous Execution of Heterogeneous Tasks in ML-Driven HPC Workflows
Conference
·
Fri May 19 00:00:00 EDT 2023
·
OSTI ID:2333668
Integration of scanning probe microscope with high-performance computing: Fixed-policy and reward-driven workflows implementation
Journal Article
·
Sun Sep 15 20:00:00 EDT 2024
· Review of Scientific Instruments
·
OSTI ID:2571043