The role of machine learning in scientific workflows
Abstract
Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity.
- Authors:
-
- University of Southern California Information Sciences Institute, Marina Del Rey, CA, USA
- Renaissance Computing Institute, Chapel Hill, NC, USA
- Lawrence Livermore National Laboratory, Livermore, CA, USA
- The University of Manchester, Manchester, UK
- Publication Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Univ. of Southern California, Marina Del Rey, CA (United States). Information Sciences Inst.
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); National Science Foundation (NSF); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1523666
- Alternate Identifier(s):
- OSTI ID: 1548331; OSTI ID: 1600091
- Report Number(s):
- LLNL-JRNL-765200
Journal ID: ISSN 1094-3420
- Grant/Contract Number:
- SC0012636; AC52-07NA27344
- Resource Type:
- Published Article
- Journal Name:
- International Journal of High Performance Computing Applications
- Additional Journal Information:
- Journal Name: International Journal of High Performance Computing Applications Journal Volume: 33 Journal Issue: 6; Journal ID: ISSN 1094-3420
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Scientific workflows; machine learning; workflow systems; anomaly detection; workflow composition
Citation Formats
Deelman, Ewa, Mandal, Anirban, Jiang, Ming, and Sakellariou, Rizos. The role of machine learning in scientific workflows. United States: N. p., 2019.
Web. doi:10.1177/1094342019852127.
Deelman, Ewa, Mandal, Anirban, Jiang, Ming, & Sakellariou, Rizos. The role of machine learning in scientific workflows. United States. https://doi.org/10.1177/1094342019852127
Deelman, Ewa, Mandal, Anirban, Jiang, Ming, and Sakellariou, Rizos. Tue .
"The role of machine learning in scientific workflows". United States. https://doi.org/10.1177/1094342019852127.
@article{osti_1523666,
title = {The role of machine learning in scientific workflows},
author = {Deelman, Ewa and Mandal, Anirban and Jiang, Ming and Sakellariou, Rizos},
abstractNote = {Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity.},
doi = {10.1177/1094342019852127},
journal = {International Journal of High Performance Computing Applications},
number = 6,
volume = 33,
place = {United States},
year = {Tue May 21 00:00:00 EDT 2019},
month = {Tue May 21 00:00:00 EDT 2019}
}
https://doi.org/10.1177/1094342019852127
Web of Science
Works referenced in this record:
TRIO: Burst Buffer Based I/O Orchestration
conference, September 2015
- Wang, Teng; Oral, Sarp; Pritchard, Michael
- 2015 IEEE International Conference on Cluster Computing (CLUSTER)
ASKALON: a Grid application development and computing environment
conference, January 2005
- Fahringer, T.; Prodan, R.
- The 6th IEEE/ACM International Workshop on Grid Computing, 2005.
Energy-Aware Workflow Scheduling Using Frequency Scaling
conference, September 2014
- Pietri, Ilia; Sakellariou, Rizos
- 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on Parallel Processing Workshops
Detecting Abnormal Machine Characteristics in Cloud Infrastructures
conference, December 2011
- Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.
- 2011 IEEE International Conference on Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on Data Mining Workshops
Failure prediction and localization in large scientific workflows
conference, January 2011
- Samak, Taghrid; Gunter, Dan; Goode, Monte
- Proceedings of the 6th workshop on Workflows in support of large-scale science - WORKS '11
Intelligent failure prediction models for scientific workflows
journal, February 2015
- Bala, Anju; Chana, Inderveer
- Expert Systems with Applications, Vol. 42, Issue 3
What makes workflows work in an opportunistic environment?
journal, January 2006
- Deelman, Ewa; Kosar, Tevfik; Kesselman, Carl
- Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid
journal, January 2016
- Poehlman, William L.; Rynge, Mats; Branton, Chris
- Bioinformatics and Biology Insights, Vol. 10
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
journal, July 2016
- Deelman, Ewa; Carothers, Christopher; Mandal, Anirban
- The International Journal of High Performance Computing Applications, Vol. 31, Issue 1
Toward an End-to-End Framework for Modeling, Monitoring and Anomaly Detection for Scientific Workflows
conference, May 2016
- Mandal, Anirban; Ruth, Paul; Baldin, Ilya
- 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
A Survey of Data-Intensive Scientific Workflow Management
journal, March 2015
- Liu, Ji; Pacitti, Esther; Valduriez, Patrick
- Journal of Grid Computing, Vol. 13, Issue 4
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System
conference, December 2016
- Dong, Bin; Byna, Suren; Wu, Kesheng
- 2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs
conference, October 2017
- Nemirovsky, Daniel; Arkose, Tugberk; Markovic, Nikola
- 2017 29th International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows
conference, November 2018
- Subedi, Pradeep; Davis, Philip; Duan, Shaohua
- SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
Principal component analysis: a review and recent developments
journal, April 2016
- Jolliffe, Ian T.; Cadima, Jorge
- Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 374, Issue 2065
Lambda architecture for cost-effective batch and speed big data processing
conference, October 2015
- Kiran, Mariam; Murphy, Peter; Monga, Inder
- 2015 IEEE International Conference on Big Data (Big Data)
Optimizing Workflow Data Footprint
journal, January 2007
- Singh, Gurmeet; Vahi, Karan; Ramakrishnan, Arun
- Scientific Programming, Vol. 15, Issue 4
The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud
journal, May 2013
- Wolstencroft, Katherine; Haines, Robert; Fellows, Donal
- Nucleic Acids Research, Vol. 41, Issue W1
Harnessing Data Movement in Virtual Clusters for In-Situ Execution
journal, March 2019
- Huang, Dan; Liu, Qing; Klasky, Scott
- IEEE Transactions on Parallel and Distributed Systems, Vol. 30, Issue 3
Anomaly detection: A survey
journal, July 2009
- Chandola, Varun; Banerjee, Arindam; Kumar, Vipin
- ACM Computing Surveys, Vol. 41, Issue 3, p. 1-58
ROSS: A high-performance, low-memory, modular Time Warp system
journal, November 2002
- Carothers, Christopher D.; Bauer, David; Pearce, Shawn
- Journal of Parallel and Distributed Computing, Vol. 62, Issue 11
A Performance Model to Estimate Execution Time of Scientific Workflows on the Cloud
conference, November 2014
- Pietri, Ilia; Juve, Gideon; Deelman, Ewa
- 2014 9th Workshop on Workflows in Support of Large-Scale Science (WORKS)
A Job Sizing Strategy for High-Throughput Scientific Workflows
journal, February 2018
- Tovar, Benjamin; da Silva, Rafael Ferreira; Juve, Gideon
- IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 2
MOHEFT: A multi-objective list-based method for workflow scheduling
conference, December 2012
- Durillo, Juan J.; Fard, Hamid Mohammadi; Prodan, Radu
- 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences
journal, January 2010
- Goecks, Jeremy; Nekrutenko, Anton; Taylor, James
- Genome Biology, Vol. 11, Issue 8
Analysis of application heartbeats: Learning structural and temporal features in time series data for identification of performance problems
conference, November 2008
- Buneci, Emma S.; Reed, Daniel A.
- 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
Local convergence of the fuzzy c-Means algorithms
journal, January 1986
- Hathaway, Richard J.; Bezdek, James C.
- Pattern Recognition, Vol. 19, Issue 6
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids
conference, January 2012
- Albrecht, Michael; Donnelly, Patrick; Bui, Peter
- Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies - SWEET '12
On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications
conference, May 2010
- Matsunaga, Andréa; Fortes, José A. B.
- 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Pegasus, a workflow management system for science automation
journal, May 2015
- Deelman, Ewa; Vahi, Karan; Juve, Gideon
- Future Generation Computer Systems, Vol. 46
Comparing machine learning classifiers in potential distribution modelling
journal, May 2011
- Lorena, Ana C.; Jacintho, Luis F. O.; Siqueira, Marinez F.
- Expert Systems with Applications, Vol. 38, Issue 5
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
journal, April 2016
- Goldstein, Markus; Uchida, Seiichi
- PLOS ONE, Vol. 11, Issue 4
Data Access for LIGO on the OSG
conference, January 2017
- Weitzel, Derek; Bockelman, Brian; Brown, Duncan A.
- Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact - PEARC17
A Declarative Optimization Engine for Resource Provisioning of Scientific Workflows in IaaS Clouds
conference, January 2015
- Zhou, Amelie Chi; He, Bingsheng; Cheng, Xuntao
- Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
Discovering cluster-based local outliers
journal, June 2003
- He, Zengyou; Xu, Xiaofei; Deng, Shengchun
- Pattern Recognition Letters, Vol. 24, Issue 9-10
Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds
journal, July 2015
- Malawski, Maciej; Juve, Gideon; Deelman, Ewa
- Future Generation Computer Systems, Vol. 48
Wings: Intelligent Workflow-Based Design of Computational Experiments
journal, January 2011
- Gil, Yolanda; Ratnakar, Varun; Kim, Jihie
- IEEE Intelligent Systems, Vol. 26, Issue 1
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
journal, August 2008
- Li, Peter; Castrillo, Juan I.; Velarde, Giles
- BMC Bioinformatics, Vol. 9, Issue 1
Predicting application performance using supervised learning on communication features
conference, January 2013
- Jain, Nikhil; Bhatele, Abhinav; Robson, Michael P.
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
A Pareto-based approach for CPU provisioning of scientific workflows on clouds
journal, May 2019
- Pietri, Ilia; Sakellariou, Rizos
- Future Generation Computer Systems, Vol. 94
Resource-efficient workflow scheduling in clouds
journal, May 2015
- Lee, Young Choon; Han, Hyuck; Zomaya, Albert Y.
- Knowledge-Based Systems, Vol. 80
In Situ Visualization at Extreme Scale: Challenges and Opportunities
journal, November 2009
- Kwan-Liu Ma,
- IEEE Computer Graphics and Applications, Vol. 29, Issue 6
The future of scientific workflows
journal, April 2017
- Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
- The International Journal of High Performance Computing Applications, Vol. 32, Issue 1
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization
journal, January 2015
- Malawski, Maciej; Figiela, Kamil; Bubak, Marian
- Scientific Programming, Vol. 2015
Workload-aware anomaly detection for Web applications
journal, March 2014
- Wang, Tao; Wei, Jun; Zhang, Wenbo
- Journal of Systems and Software, Vol. 89
Performance Anomaly Detection and Bottleneck Identification
journal, July 2015
- Ibidunmoye, Olumuyiwa; Hernández-Rodriguez, Francisco; Elmroth, Erik
- ACM Computing Surveys, Vol. 48, Issue 1
Aspen: A domain specific language for performance modeling
conference, November 2012
- Spafford, Kyle L.; Vetter, Jeffrey S.
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
A survey of data provenance in e-science
journal, September 2005
- Simmhan, Yogesh L.; Plale, Beth; Gannon, Dennis
- ACM SIGMOD Record, Vol. 34, Issue 3
How to Track Your Data: The Case for Cloud Computing Provenance
conference, November 2011
- Zhang, Olive Qing; Kirchberg, Markus; Ko, Ryan K. L.
- 2011 IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Sequential Competitive Learning and the Fuzzy c-Means Clustering Algorithms
journal, July 1996
- Pal, Nikhil R.; Bezdek, James C.; Hathaway, Richard J.
- Neural Networks, Vol. 9, Issue 5