skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

Abstract

Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.

Authors:
 [1];  [2];  [3];  [4];  [5];  [3];  [3];  [1];  [6];  [5];  [5];  [5];  [5];  [3];  [1]
  1. Univ. of Southern California, Los Angeles, CA (United States)
  2. Rensselaer Polytechnic Inst., Troy, NY (United States)
  3. Univ. of North Carolina, Chapel Hill, NC (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  6. Univ. of Southern California, Los Angeles, CA (United States); AGH - Univ. of Science and Technology, Krakow (Poland)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Spallation Neutron Source (SNS)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1265426
Grant/Contract Number:  
AC05-00OR22725; SC0012636
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Name: International Journal of High Performance Computing Applications; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; performance modeling; extreme scale; scientific workflow

Citation Formats

Deelman, Ewa, Carothers, Christopher, Mandal, Anirban, Tierney, Brian, Vetter, Jeffrey S., Baldin, Ilya, Castillo, Claris, Juve, Gideon, Krol, Dariusz, Lynch, Vickie, Mayer, Ben, Meredith, Jeremy, Proffen, Thomas, Ruth, Paul, and Ferreira da Silva, Rafael. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows. United States: N. p., 2015. Web. doi:10.1177/1094342015594515.
Deelman, Ewa, Carothers, Christopher, Mandal, Anirban, Tierney, Brian, Vetter, Jeffrey S., Baldin, Ilya, Castillo, Claris, Juve, Gideon, Krol, Dariusz, Lynch, Vickie, Mayer, Ben, Meredith, Jeremy, Proffen, Thomas, Ruth, Paul, & Ferreira da Silva, Rafael. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows. United States. https://doi.org/10.1177/1094342015594515
Deelman, Ewa, Carothers, Christopher, Mandal, Anirban, Tierney, Brian, Vetter, Jeffrey S., Baldin, Ilya, Castillo, Claris, Juve, Gideon, Krol, Dariusz, Lynch, Vickie, Mayer, Ben, Meredith, Jeremy, Proffen, Thomas, Ruth, Paul, and Ferreira da Silva, Rafael. 2015. "PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows". United States. https://doi.org/10.1177/1094342015594515. https://www.osti.gov/servlets/purl/1265426.
@article{osti_1265426,
title = {PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows},
author = {Deelman, Ewa and Carothers, Christopher and Mandal, Anirban and Tierney, Brian and Vetter, Jeffrey S. and Baldin, Ilya and Castillo, Claris and Juve, Gideon and Krol, Dariusz and Lynch, Vickie and Mayer, Ben and Meredith, Jeremy and Proffen, Thomas and Ruth, Paul and Ferreira da Silva, Rafael},
abstractNote = {Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.},
doi = {10.1177/1094342015594515},
url = {https://www.osti.gov/biblio/1265426}, journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
conference, January 1995

  • Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
  • Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95
  • https://doi.org/10.1145/215399.215427

Mantid—Data analysis and visualization package for neutron scattering and μ SR experiments
journal, November 2014

  • Arnold, O.; Bilheux, J. C.; Borreguero, J. M.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 764
  • https://doi.org/10.1016/j.nima.2014.07.029

Warp speed: executing time warp on 1,966,080 cores
conference, January 2013

  • Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
  • Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13
  • https://doi.org/10.1145/2486092.2486134

Scalable Time Warp on Blue Gene Supercomputers
conference, June 2009

  • Bauer Jr., David W.; Carothers, Christopher D.; Holder, Akintayo
  • 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation (PADS)
  • https://doi.org/10.1109/PADS.2009.21

Characterization of scientific workflows
conference, November 2008


On deciding between conservative and optimistic approaches on massively parallel platforms
conference, December 2010


Efficient optimistic parallel simulations using reverse computation
journal, July 1999


LogP: towards a realistic model of parallel computation
journal, July 1993


On the communication complexity of 3D FFTs and its implications for Exascale
conference, January 2012


A Lightweight Middleware Monitor for Distributed Scientific Workflows
conference, May 2008

  • Cruz, Sergio Manuel Serra da; Silva, Fabricio Nogueira da; Gadelha Jr., Luiz M. R.
  • 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
  • https://doi.org/10.1109/CCGRID.2008.89

Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example
conference, December 2006


Data Management Challenges of Data-Intensive Scientific Workflows
conference, May 2008

  • Deelman, Ewa; Chervenak, Ann
  • 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
  • https://doi.org/10.1109/CCGRID.2008.24

Workflows and e-Science: An overview of workflow system features and capabilities
journal, May 2009


Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems
journal, January 2005


Pegasus, a workflow management system for science automation
journal, May 2015


Community Resources for Enabling Research in Distributed Scientific Workflows
conference, October 2014


Self-healing of workflow activity incidents on distributed computing infrastructures
journal, October 2013


Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions: CONTROLLING FAIRNESS AND TASK GRANULARITY IN WORKFLOWS
journal, May 2014

  • Ferreira da Silva, Rafael; Glatard, Tristan; Desprez, Frédéric
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 14
  • https://doi.org/10.1002/cpe.3303

Toward fine-grained online task characteristics estimation in scientific workflows
conference, January 2013


An introductory exascale feasibility study for FFTs and multigrid
conference, April 2010


Measuring TeraGrid: workload characterization for a high-performance computing federation
journal, February 2011


Grid Computing Workloads
journal, March 2011


The Grid Workloads Archive
journal, July 2008


Using simulation to design extremescale applications and architectures: programming model exploration
journal, March 2011


Characterizing and profiling scientific workflows
journal, March 2013


Practical Resource Monitoring for Robust High Throughput Computing
conference, September 2015


The vision of autonomic computing
journal, January 2003


The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
conference, May 2010


COMPASS: A Framework for Automated Performance Modeling and Prediction
conference, January 2015


Sassena — X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers
journal, July 2012


On the role of burst buffers in leadership-class storage systems
conference, April 2012


Characterizing workflow-based activity on a production e-infrastructure using provenance data
journal, October 2013


Workload Characterization for Capacity Planning and Performance Management in IaaS Cloud
conference, October 2012


Enabling persistent queries for cross-aggregate performance monitoring
journal, May 2014


Evaluating I/O aware network management for scientific workflows on networked clouds
conference, January 2013


Auto-scaling to minimize cost and meet application deadlines in cloud workflows
conference, January 2011


The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research
journal, November 2006


The macroscopic behavior of the TCP congestion avoidance algorithm
journal, July 1997


Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
conference, November 2012

  • Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • https://doi.org/10.1109/SC.Companion.2012.56

Dynamic Cloud provisioning for scientific Grid workflows
conference, October 2010


Scalable molecular dynamics with NAMD
journal, January 2005


Workload characterization on a production Hadoop cluster: A case study on Taobao
conference, November 2012


The structural simulation toolkit
journal, March 2011


Online Fault and Anomaly Detection for Large-Scale Scientific Workflows
conference, September 2011

  • Samak, Taghrid; Gunter, Dan; Goode, Monte
  • Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications
  • https://doi.org/10.1109/HPCC.2011.55

Failure prediction and localization in large scientific workflows
conference, January 2011


Application-Level Resource Provisioning on the Grid
conference, December 2006


Aspen: A domain specific language for performance modeling
conference, November 2012

  • Spafford, Kyle L.; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2012.20

Modeling synthetic aperture radar computation with Aspen
journal, July 2013


A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions
conference, November 2014


A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows
journal, June 2013


Rethinking data management for big data scientific workflows
conference, October 2013


A bridging model for parallel computation
journal, August 1990


Works referencing / citing this record:

The role of machine learning in scientific workflows
journal, May 2019


Symbolic regression in materials science
journal, June 2019