skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Modular performance prediction for scientific workflows using Machine Learning

Abstract

Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation. In this paper, we present a modular framework that leverages Machine Learning for creating precise performance predictions of a workflow. The central idea is to partition a workflow in such a way that makes the task of forecasting each atomic unit manageable and gives us a way to combine the individual predictions efficiently. We recognize a combination of an executable and a specific physical resource as a single module. This gives us a handle to characterize workload and machine power as a single unit of prediction. Overall, our modular technique of creating atomic modules and deployment of longest-path approach to estimate workflow performance, allows the framework to adapt to highly complex nested directed acyclic workflows and scale to new scenarios, since it does not makemore » assumptions of underlying workflow structure. We present performance estimation results of independent workflow modules executed on the XSEDE SDSC Comet cluster using various Machine Learning algorithms. The results provide insights into the behavior and effectiveness of different algorithms in the context of scientific workflow performance prediction.« less

Authors:
 [1];  [1];  [1];  [1]
  1. Univ. of California, San Diego, La Jolla, CA (United States). San Diego Supercomputer Center
Publication Date:
Research Org.:
Univ. of California, San Diego, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF); National Institutes of Health (NIH)
OSTI Identifier:
1851724
Alternate Identifier(s):
OSTI ID: 1776457
Grant/Contract Number:  
SC0012630; DBI 1062565; DBI 1331615; P41 GM103426
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Future Generations Computer Systems
Additional Journal Information:
Journal Volume: 114; Journal Issue: C; Journal ID: ISSN 0167-739X
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Computer Science; Machine learning; Scientific workflows; Performance prediction; Parallel computing; Distributed computing; Exascale computing

Citation Formats

Singh, Alok, Purawat, Shweta, Rao, Arvind, and Altintas, Ilkay. Modular performance prediction for scientific workflows using Machine Learning. United States: N. p., 2020. Web. doi:10.1016/j.future.2020.04.048.
Singh, Alok, Purawat, Shweta, Rao, Arvind, & Altintas, Ilkay. Modular performance prediction for scientific workflows using Machine Learning. United States. https://doi.org/10.1016/j.future.2020.04.048
Singh, Alok, Purawat, Shweta, Rao, Arvind, and Altintas, Ilkay. 2020. "Modular performance prediction for scientific workflows using Machine Learning". United States. https://doi.org/10.1016/j.future.2020.04.048. https://www.osti.gov/servlets/purl/1851724.
@article{osti_1851724,
title = {Modular performance prediction for scientific workflows using Machine Learning},
author = {Singh, Alok and Purawat, Shweta and Rao, Arvind and Altintas, Ilkay},
abstractNote = {Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation. In this paper, we present a modular framework that leverages Machine Learning for creating precise performance predictions of a workflow. The central idea is to partition a workflow in such a way that makes the task of forecasting each atomic unit manageable and gives us a way to combine the individual predictions efficiently. We recognize a combination of an executable and a specific physical resource as a single module. This gives us a handle to characterize workload and machine power as a single unit of prediction. Overall, our modular technique of creating atomic modules and deployment of longest-path approach to estimate workflow performance, allows the framework to adapt to highly complex nested directed acyclic workflows and scale to new scenarios, since it does not make assumptions of underlying workflow structure. We present performance estimation results of independent workflow modules executed on the XSEDE SDSC Comet cluster using various Machine Learning algorithms. The results provide insights into the behavior and effectiveness of different algorithms in the context of scientific workflow performance prediction.},
doi = {10.1016/j.future.2020.04.048},
url = {https://www.osti.gov/biblio/1851724}, journal = {Future Generations Computer Systems},
issn = {0167-739X},
number = C,
volume = 114,
place = {United States},
year = {2020},
month = {5}
}

Works referenced in this record:

Biomedical Big Data Training Collaborative (BBDTC): An effort to bridge the talent gap in biomedical science and research
journal, May 2017


Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications
journal, March 2003


Support-vector networks
journal, September 1995


Characterizing and profiling scientific workflows
journal, March 2013


Large memory high performance computing enables comparison across human gut microbiome of patients with autoimmune diseases and healthy subjects
conference, July 2013

  • Wu, Sitao; Li, Weizhong; Smarr, Larry
  • XSEDE '13: Extreme Science and Engineering Discovery Environment: Gateway to Discovery, Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
  • https://doi.org/10.1145/2484762.2484828

Workflows and e-Science: An overview of workflow system features and capabilities
journal, May 2009


A tutorial on support vector regression
journal, August 2004


A regression-based approach to scalability prediction
conference, January 2008


Milepost GCC: Machine Learning Enabled Self-tuning Compiler
journal, January 2011


A novel statistical time-series pattern based interval forecasting strategy for activity durations in workflow systems
journal, March 2011


Kepler + CometCloud: Dynamic Scientific Workflow Execution on Federated Cloud Resources
journal, January 2016


Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper
conference, January 2012


Predicting the Execution Time of Workflow Activities Based on Their Input Features
conference, November 2012

  • Miu, Tudor; Missier, Paolo
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • https://doi.org/10.1109/SC.Companion.2012.21

The future of scientific workflows
journal, April 2017


Scientific workflow management and the Kepler system
journal, January 2006

  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10
  • https://doi.org/10.1002/cpe.994

On Performance Modeling and Prediction in Support of Scientific Workflow Optimization
conference, July 2011


Analysis of benchmark characteristics and benchmark performance prediction
journal, November 1996


A multi-strategy collaborative prediction model for the runtime of online tasks in computing cluster/grid
journal, October 2010


machine.
journal, October 2001


Random Forests
journal, January 2001