PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Abstract
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.
- Authors:
-
- Univ. of Southern California, Los Angeles, CA (United States)
- Rensselaer Polytechnic Inst., Troy, NY (United States)
- Univ. of North Carolina, Chapel Hill, NC (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Univ. of Southern California, Los Angeles, CA (United States); AGH - Univ. of Science and Technology, Krakow (Poland)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Spallation Neutron Source (SNS)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 1265426
- Grant/Contract Number:
- AC05-00OR22725; SC0012636
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- International Journal of High Performance Computing Applications
- Additional Journal Information:
- Journal Name: International Journal of High Performance Computing Applications; Journal ID: ISSN 1094-3420
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; performance modeling; extreme scale; scientific workflow
Citation Formats
Deelman, Ewa, Carothers, Christopher, Mandal, Anirban, Tierney, Brian, Vetter, Jeffrey S., Baldin, Ilya, Castillo, Claris, Juve, Gideon, Krol, Dariusz, Lynch, Vickie, Mayer, Ben, Meredith, Jeremy, Proffen, Thomas, Ruth, Paul, and Ferreira da Silva, Rafael. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows. United States: N. p., 2015.
Web. doi:10.1177/1094342015594515.
Deelman, Ewa, Carothers, Christopher, Mandal, Anirban, Tierney, Brian, Vetter, Jeffrey S., Baldin, Ilya, Castillo, Claris, Juve, Gideon, Krol, Dariusz, Lynch, Vickie, Mayer, Ben, Meredith, Jeremy, Proffen, Thomas, Ruth, Paul, & Ferreira da Silva, Rafael. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows. United States. https://doi.org/10.1177/1094342015594515
Deelman, Ewa, Carothers, Christopher, Mandal, Anirban, Tierney, Brian, Vetter, Jeffrey S., Baldin, Ilya, Castillo, Claris, Juve, Gideon, Krol, Dariusz, Lynch, Vickie, Mayer, Ben, Meredith, Jeremy, Proffen, Thomas, Ruth, Paul, and Ferreira da Silva, Rafael. 2015.
"PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows". United States. https://doi.org/10.1177/1094342015594515. https://www.osti.gov/servlets/purl/1265426.
@article{osti_1265426,
title = {PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows},
author = {Deelman, Ewa and Carothers, Christopher and Mandal, Anirban and Tierney, Brian and Vetter, Jeffrey S. and Baldin, Ilya and Castillo, Claris and Juve, Gideon and Krol, Dariusz and Lynch, Vickie and Mayer, Ben and Meredith, Jeremy and Proffen, Thomas and Ruth, Paul and Ferreira da Silva, Rafael},
abstractNote = {Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.},
doi = {10.1177/1094342015594515},
url = {https://www.osti.gov/biblio/1265426},
journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {7}
}
Web of Science
Works referenced in this record:
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
conference, January 1995
- Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
- Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95
Mantid—Data analysis and visualization package for neutron scattering and SR experiments
journal, November 2014
- Arnold, O.; Bilheux, J. C.; Borreguero, J. M.
- Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 764
Warp speed: executing time warp on 1,966,080 cores
conference, January 2013
- Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
- Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13
Scalable Time Warp on Blue Gene Supercomputers
conference, June 2009
- Bauer Jr., David W.; Carothers, Christopher D.; Holder, Akintayo
- 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation (PADS)
Characterization of scientific workflows
conference, November 2008
- Bharathi, Shishir; Chervenak, Ann; Deelman, Ewa
- 2008 Third Workshop on Workflows in Support of Large-Scale Science (WORKS 2008)
On deciding between conservative and optimistic approaches on massively parallel platforms
conference, December 2010
- Carothers, Christoph; Perumalla, Kalyan S.
- 2010 Winter Simulation Conference - (WSC 2010), Proceedings of the 2010 Winter Simulation Conference
Efficient optimistic parallel simulations using reverse computation
journal, July 1999
- Carothers, Christopher D.; Perumalla, Kalyan S.; Fujimoto, Richard M.
- ACM Transactions on Modeling and Computer Simulation, Vol. 9, Issue 3
LogP: towards a realistic model of parallel computation
journal, July 1993
- Culler, David; Karp, Richard; Patterson, David
- ACM SIGPLAN Notices, Vol. 28, Issue 7
On the communication complexity of 3D FFTs and its implications for Exascale
conference, January 2012
- Czechowski, Kenneth; Battaglino, Casey; McClanahan, Chris
- Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
A Lightweight Middleware Monitor for Distributed Scientific Workflows
conference, May 2008
- Cruz, Sergio Manuel Serra da; Silva, Fabricio Nogueira da; Gadelha Jr., Luiz M. R.
- 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example
conference, December 2006
- Deelman, Ewa; Callaghan, Scott; Field, Edward
- 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06)
Data Management Challenges of Data-Intensive Scientific Workflows
conference, May 2008
- Deelman, Ewa; Chervenak, Ann
- 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
Workflows and e-Science: An overview of workflow system features and capabilities
journal, May 2009
- Deelman, Ewa; Gannon, Dennis; Shields, Matthew
- Future Generation Computer Systems, Vol. 25, Issue 5
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems
journal, January 2005
- Deelman, Ewa; Singh, Gurmeet; Su, Mei-Hui
- Scientific Programming, Vol. 13, Issue 3
Pegasus, a workflow management system for science automation
journal, May 2015
- Deelman, Ewa; Vahi, Karan; Juve, Gideon
- Future Generation Computer Systems, Vol. 46
Community Resources for Enabling Research in Distributed Scientific Workflows
conference, October 2014
- Silva, Rafael Ferreira da; Chen, Weiwei; Juve, Gideon
- 2014 IEEE 10th International Conference on e-Science (e-Science)
Self-healing of workflow activity incidents on distributed computing infrastructures
journal, October 2013
- Ferreira da Silva, Rafael; Glatard, Tristan; Desprez, Frédéric
- Future Generation Computer Systems, Vol. 29, Issue 8
Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions: CONTROLLING FAIRNESS AND TASK GRANULARITY IN WORKFLOWS
journal, May 2014
- Ferreira da Silva, Rafael; Glatard, Tristan; Desprez, Frédéric
- Concurrency and Computation: Practice and Experience, Vol. 26, Issue 14
Toward fine-grained online task characteristics estimation in scientific workflows
conference, January 2013
- da Silva, Rafael Ferreira; Juve, Gideon; Deelman, Ewa
- Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science - WORKS '13
An introductory exascale feasibility study for FFTs and multigrid
conference, April 2010
- Gahvari, Hormozd; Gropp, William
- 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
Measuring TeraGrid: workload characterization for a high-performance computing federation
journal, February 2011
- Hart, David L.
- The International Journal of High Performance Computing Applications, Vol. 25, Issue 4
Grid Computing Workloads
journal, March 2011
- Iosup, Alexandru; Epema, Dick
- IEEE Internet Computing, Vol. 15, Issue 2
The Grid Workloads Archive
journal, July 2008
- Iosup, Alexandru; Li, Hui; Jan, Mathieu
- Future Generation Computer Systems, Vol. 24, Issue 7
Using simulation to design extremescale applications and architectures: programming model exploration
journal, March 2011
- Janssen, Curtis L.; Adalsteinsson, Helgi; Kenny, Joseph P.
- ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4
Characterizing and profiling scientific workflows
journal, March 2013
- Juve, Gideon; Chervenak, Ann; Deelman, Ewa
- Future Generation Computer Systems, Vol. 29, Issue 3
Practical Resource Monitoring for Robust High Throughput Computing
conference, September 2015
- Juve, Gideon; Tovar, Benjamin; Silva, Rafael Ferreira da
- 2015 IEEE International Conference on Cluster Computing (CLUSTER)
The vision of autonomic computing
journal, January 2003
- Kephart, J. O.; Chess, D. M.
- Computer, Vol. 36, Issue 1
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
conference, May 2010
- Kondo, Derrick; Javadi, Bahman; Iosup, Alexandru
- 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
COMPASS: A Framework for Automated Performance Modeling and Prediction
conference, January 2015
- Lee, Seyong; Meredith, Jeremy S.; Vetter, Jeffrey S.
- Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15
Sassena — X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers
journal, July 2012
- Lindner, Benjamin; Smith, Jeremy C.
- Computer Physics Communications, Vol. 183, Issue 7
On the role of burst buffers in leadership-class storage systems
conference, April 2012
- Liu, Ning; Cope, Jason; Carns, Philip
- 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
Characterizing workflow-based activity on a production e-infrastructure using provenance data
journal, October 2013
- Madougou, Souley; Shahand, Shayan; Santcroos, Mark
- Future Generation Computer Systems, Vol. 29, Issue 8
Workload Characterization for Capacity Planning and Performance Management in IaaS Cloud
conference, October 2012
- Mahambre, Shruti; Kulkarni, Purushottam; Bellur, Umesh
- 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)
Enabling persistent queries for cross-aggregate performance monitoring
journal, May 2014
- Mandal, Anirban; Baldin, Ilya
- IEEE Communications Magazine, Vol. 52, Issue 5
Evaluating I/O aware network management for scientific workflows on networked clouds
conference, January 2013
- Mandal, Anirban; Ruth, Paul; Baldin, Ilya
- Proceedings of the Third International Workshop on Network-Aware Data Management - NDM '13
Auto-scaling to minimize cost and meet application deadlines in cloud workflows
conference, January 2011
- Mao, Ming; Humphrey, Marty
- Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research
journal, November 2006
- Mason, T. E.; Abernathy, D.; Anderson, I.
- Physica B: Condensed Matter, Vol. 385-386
The macroscopic behavior of the TCP congestion avoidance algorithm
journal, July 1997
- Mathis, Matthew; Semke, Jeffrey; Mahdavi, Jamshid
- ACM SIGCOMM Computer Communication Review, Vol. 27, Issue 3
Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
conference, November 2012
- Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert
- 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Dynamic Cloud provisioning for scientific Grid workflows
conference, October 2010
- Ostermann, Simon; Prodan, Radu; Fahringer, Thomas
- 2010 11th IEEE/ACM International Conference on Grid Computing (GRID)
Scalable molecular dynamics with NAMD
journal, January 2005
- Phillips, James C.; Braun, Rosemary; Wang, Wei
- Journal of Computational Chemistry, Vol. 26, Issue 16, p. 1781-1802
Workload characterization on a production Hadoop cluster: A case study on Taobao
conference, November 2012
- Ren, Zujie; Xu, Xianghua; Wan, Jian
- 2012 IEEE International Symposium on Workload Characterization (IISWC)
The structural simulation toolkit
journal, March 2011
- Rodrigues, A. F.; CooperBalls, E.; Jacob, B.
- ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4
Online Fault and Anomaly Detection for Large-Scale Scientific Workflows
conference, September 2011
- Samak, Taghrid; Gunter, Dan; Goode, Monte
- Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
Failure prediction and localization in large scientific workflows
conference, January 2011
- Samak, Taghrid; Gunter, Dan; Goode, Monte
- Proceedings of the 6th workshop on Workflows in support of large-scale science - WORKS '11
Application-Level Resource Provisioning on the Grid
conference, December 2006
- Singh, Gurmeet; Kesselman, Carl; Deelman, Ewa
- 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06)
Aspen: A domain specific language for performance modeling
conference, November 2012
- Spafford, Kyle L.; Vetter, Jeffrey S.
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Modeling synthetic aperture radar computation with Aspen
journal, July 2013
- Spafford, Kyle; Vetter, Jeffrey S.; Benson, Thomas
- The International Journal of High Performance Computing Applications, Vol. 27, Issue 3
A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions
conference, November 2014
- Srinivasan, Sudarshan; Juve, Gideon; Silva, Rafael Ferreira da
- 2014 9th Workshop on Workflows in Support of Large-Scale Science (WORKS)
A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows
journal, June 2013
- Vahi, Karan; Harvey, Ian; Samak, Taghrid
- Journal of Grid Computing, Vol. 11, Issue 3
Rethinking data management for big data scientific workflows
conference, October 2013
- Vahi, Karan; Rynge, Mats; Juve, Gideon
- 2013 IEEE International Conference on Big Data
A bridging model for parallel computation
journal, August 1990
- Valiant, Leslie G.
- Communications of the ACM, Vol. 33, Issue 8
Works referencing / citing this record:
The role of machine learning in scientific workflows
journal, May 2019
- Deelman, Ewa; Mandal, Anirban; Jiang, Ming
- The International Journal of High Performance Computing Applications, Vol. 33, Issue 6
Symbolic regression in materials science
journal, June 2019
- Wang, Yiqun; Wagner, Nicholas; Rondinelli, James M.
- MRS Communications, Vol. 9, Issue 3