Modular performance prediction for scientific workflows using Machine Learning
Abstract
Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation. In this paper, we present a modular framework that leverages Machine Learning for creating precise performance predictions of a workflow. The central idea is to partition a workflow in such a way that makes the task of forecasting each atomic unit manageable and gives us a way to combine the individual predictions efficiently. We recognize a combination of an executable and a specific physical resource as a single module. This gives us a handle to characterize workload and machine power as a single unit of prediction. Overall, our modular technique of creating atomic modules and deployment of longest-path approach to estimate workflow performance, allows the framework to adapt to highly complex nested directed acyclic workflows and scale to new scenarios, since it does not makemore »
- Authors:
-
- Univ. of California, San Diego, La Jolla, CA (United States). San Diego Supercomputer Center
- Publication Date:
- Research Org.:
- Univ. of California, San Diego, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); National Science Foundation (NSF); National Institutes of Health (NIH)
- OSTI Identifier:
- 1851724
- Alternate Identifier(s):
- OSTI ID: 1776457
- Grant/Contract Number:
- SC0012630; DBI 1062565; DBI 1331615; P41 GM103426
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- Future Generations Computer Systems
- Additional Journal Information:
- Journal Volume: 114; Journal Issue: C; Journal ID: ISSN 0167-739X
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Computer Science; Machine learning; Scientific workflows; Performance prediction; Parallel computing; Distributed computing; Exascale computing
Citation Formats
Singh, Alok, Purawat, Shweta, Rao, Arvind, and Altintas, Ilkay. Modular performance prediction for scientific workflows using Machine Learning. United States: N. p., 2020.
Web. doi:10.1016/j.future.2020.04.048.
Singh, Alok, Purawat, Shweta, Rao, Arvind, & Altintas, Ilkay. Modular performance prediction for scientific workflows using Machine Learning. United States. https://doi.org/10.1016/j.future.2020.04.048
Singh, Alok, Purawat, Shweta, Rao, Arvind, and Altintas, Ilkay. 2020.
"Modular performance prediction for scientific workflows using Machine Learning". United States. https://doi.org/10.1016/j.future.2020.04.048. https://www.osti.gov/servlets/purl/1851724.
@article{osti_1851724,
title = {Modular performance prediction for scientific workflows using Machine Learning},
author = {Singh, Alok and Purawat, Shweta and Rao, Arvind and Altintas, Ilkay},
abstractNote = {Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation. In this paper, we present a modular framework that leverages Machine Learning for creating precise performance predictions of a workflow. The central idea is to partition a workflow in such a way that makes the task of forecasting each atomic unit manageable and gives us a way to combine the individual predictions efficiently. We recognize a combination of an executable and a specific physical resource as a single module. This gives us a handle to characterize workload and machine power as a single unit of prediction. Overall, our modular technique of creating atomic modules and deployment of longest-path approach to estimate workflow performance, allows the framework to adapt to highly complex nested directed acyclic workflows and scale to new scenarios, since it does not make assumptions of underlying workflow structure. We present performance estimation results of independent workflow modules executed on the XSEDE SDSC Comet cluster using various Machine Learning algorithms. The results provide insights into the behavior and effectiveness of different algorithms in the context of scientific workflow performance prediction.},
doi = {10.1016/j.future.2020.04.048},
url = {https://www.osti.gov/biblio/1851724},
journal = {Future Generations Computer Systems},
issn = {0167-739X},
number = C,
volume = 114,
place = {United States},
year = {2020},
month = {5}
}
Works referenced in this record:
Biomedical Big Data Training Collaborative (BBDTC): An effort to bridge the talent gap in biomedical science and research
journal, May 2017
- Purawat, Shweta; Cowart, Charles; Amaro, Rommie E.
- Journal of Computational Science, Vol. 20
Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications
journal, March 2003
- Taylor, Valerie; Wu, Xingfu; Stevens, Rick
- ACM SIGMETRICS Performance Evaluation Review, Vol. 30, Issue 4
Support-vector networks
journal, September 1995
- Cortes, Corinna; Vapnik, Vladimir
- Machine Learning, Vol. 20, Issue 3
Characterizing and profiling scientific workflows
journal, March 2013
- Juve, Gideon; Chervenak, Ann; Deelman, Ewa
- Future Generation Computer Systems, Vol. 29, Issue 3
Large memory high performance computing enables comparison across human gut microbiome of patients with autoimmune diseases and healthy subjects
conference, July 2013
- Wu, Sitao; Li, Weizhong; Smarr, Larry
- XSEDE '13: Extreme Science and Engineering Discovery Environment: Gateway to Discovery, Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Workflows and e-Science: An overview of workflow system features and capabilities
journal, May 2009
- Deelman, Ewa; Gannon, Dennis; Shields, Matthew
- Future Generation Computer Systems, Vol. 25, Issue 5
A tutorial on support vector regression
journal, August 2004
- Smola, Alex J.; Schölkopf, Bernhard
- Statistics and Computing, Vol. 14, Issue 3
A regression-based approach to scalability prediction
conference, January 2008
- Barnes, Bradley J.; Rountree, Barry; Lowenthal, David K.
- Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
Milepost GCC: Machine Learning Enabled Self-tuning Compiler
journal, January 2011
- Fursin, Grigori; Kashnikov, Yuriy; Memon, Abdul Wahid
- International Journal of Parallel Programming, Vol. 39, Issue 3
A novel statistical time-series pattern based interval forecasting strategy for activity durations in workflow systems
journal, March 2011
- Liu, Xiao; Ni, Zhiwei; Yuan, Dong
- Journal of Systems and Software, Vol. 84, Issue 3
Kepler + CometCloud: Dynamic Scientific Workflow Execution on Federated Cloud Resources
journal, January 2016
- Wang, Jianwu; AbdelBaky, Moustafa; Diaz-Montes, Javier
- Procedia Computer Science, Vol. 80
Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper
conference, January 2012
- Altintas, Ilkay; Wang, Jianwu; Crawl, Daniel
- Proceedings of the 2012 Joint EDBT/ICDT Workshops on - EDBT-ICDT '12
Predicting the Execution Time of Workflow Activities Based on Their Input Features
conference, November 2012
- Miu, Tudor; Missier, Paolo
- 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
The future of scientific workflows
journal, April 2017
- Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
- The International Journal of High Performance Computing Applications, Vol. 32, Issue 1
Scientific workflow management and the Kepler system
journal, January 2006
- Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
- Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10
On Performance Modeling and Prediction in Support of Scientific Workflow Optimization
conference, July 2011
- Wu, Qishi; Datla, Vivek V.
- 2011 IEEE World Congress on Services (SERVICES)
Analysis of benchmark characteristics and benchmark performance prediction
journal, November 1996
- Saavedra, Rafael H.; Smith, Alan J.
- ACM Transactions on Computer Systems, Vol. 14, Issue 4
A multi-strategy collaborative prediction model for the runtime of online tasks in computing cluster/grid
journal, October 2010
- Tao, Ming; Dong, Shoubin; Zhang, Liping
- Cluster Computing, Vol. 14, Issue 2