skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [3];  [4];  [5];  [3];  [3];  [1];  [6];  [5];  [5];  [5];  [5];  [3];  [1]
  1. Univ. of Southern California, Los Angeles, CA (United States)
  2. Rensselaer Polytechnic Inst., Troy, NY (United States)
  3. Univ. of North Carolina, Chapel Hill, NC (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  6. Univ. of Southern California, Los Angeles, CA (United States); AGH - Univ. of Science and Technology, Krakow (Poland)

Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Spallation Neutron Source (SNS)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC05-00OR22725; SC0012636
OSTI ID:
1265426
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 19 works
Citation information provided by
Web of Science

References (57)

LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
  • Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
  • Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95 https://doi.org/10.1145/215399.215427
conference January 1995
Mantid—Data analysis and visualization package for neutron scattering and μ SR experiments
  • Arnold, O.; Bilheux, J. C.; Borreguero, J. M.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 764 https://doi.org/10.1016/j.nima.2014.07.029
journal November 2014
Warp speed: executing time warp on 1,966,080 cores
  • Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
  • Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13 https://doi.org/10.1145/2486092.2486134
conference January 2013
Scalable Time Warp on Blue Gene Supercomputers
  • Bauer Jr., David W.; Carothers, Christopher D.; Holder, Akintayo
  • 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation (PADS) https://doi.org/10.1109/PADS.2009.21
conference June 2009
Characterization of scientific workflows conference November 2008
On deciding between conservative and optimistic approaches on massively parallel platforms conference December 2010
Efficient optimistic parallel simulations using reverse computation journal July 1999
LogP: towards a realistic model of parallel computation journal July 1993
On the communication complexity of 3D FFTs and its implications for Exascale conference January 2012
A Lightweight Middleware Monitor for Distributed Scientific Workflows
  • Cruz, Sergio Manuel Serra da; Silva, Fabricio Nogueira da; Gadelha Jr., Luiz M. R.
  • 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID) https://doi.org/10.1109/CCGRID.2008.89
conference May 2008
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example conference December 2006
Data Management Challenges of Data-Intensive Scientific Workflows
  • Deelman, Ewa; Chervenak, Ann
  • 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID) https://doi.org/10.1109/CCGRID.2008.24
conference May 2008
Workflows and e-Science: An overview of workflow system features and capabilities journal May 2009
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005
Pegasus, a workflow management system for science automation journal May 2015
Community Resources for Enabling Research in Distributed Scientific Workflows conference October 2014
A Science-Gateway Workload Archive to Study Pilot Jobs, User Activity, Bag of Tasks, Task Sub-steps, and Workflow Executions book January 2013
Self-healing of workflow activity incidents on distributed computing infrastructures journal October 2013
Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions: CONTROLLING FAIRNESS AND TASK GRANULARITY IN WORKFLOWS
  • Ferreira da Silva, Rafael; Glatard, Tristan; Desprez, Frédéric
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 14 https://doi.org/10.1002/cpe.3303
journal May 2014
Toward fine-grained online task characteristics estimation in scientific workflows conference January 2013
An introductory exascale feasibility study for FFTs and multigrid conference April 2010
Measuring TeraGrid: workload characterization for a high-performance computing federation journal February 2011
Grid Computing Workloads journal March 2011
The Grid Workloads Archive journal July 2008
Using simulation to design extremescale applications and architectures: programming model exploration journal March 2011
Characterizing and profiling scientific workflows journal March 2013
Practical Resource Monitoring for Robust High Throughput Computing conference September 2015
The vision of autonomic computing journal January 2003
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems conference May 2010
COMPASS: A Framework for Automated Performance Modeling and Prediction conference January 2015
Sassena — X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers journal July 2012
On the role of burst buffers in leadership-class storage systems conference April 2012
Characterizing workflow-based activity on a production e-infrastructure using provenance data journal October 2013
Workload Characterization for Capacity Planning and Performance Management in IaaS Cloud conference October 2012
Enabling persistent queries for cross-aggregate performance monitoring journal May 2014
Evaluating I/O aware network management for scientific workflows on networked clouds conference January 2013
Auto-scaling to minimize cost and meet application deadlines in cloud workflows conference January 2011
The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research journal November 2006
The macroscopic behavior of the TCP congestion avoidance algorithm journal July 1997
Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
  • Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.56
conference November 2012
Workflow Monitoring and Analysis Tool for ASKALON book October 2008
A Trace-Based Investigation Of The Characteristics Of Grid Workflows book January 2008
Dynamic Cloud provisioning for scientific Grid workflows conference October 2010
Scalable molecular dynamics with NAMD journal January 2005
Workload characterization on a production Hadoop cluster: A case study on Taobao conference November 2012
The structural simulation toolkit journal March 2011
Workflow Exception Patterns book January 2006
Failure prediction and localization in large scientific workflows conference January 2011
Application-Level Resource Provisioning on the Grid conference December 2006
Aspen: A domain specific language for performance modeling
  • Spafford, Kyle L.; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.20
conference November 2012
Modeling synthetic aperture radar computation with Aspen journal July 2013
A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions conference November 2014
Instantiating a Global Network Measurement Framework report December 2008
SCALEA-G: A Unified Monitoring and Performance Analysis System for the Grid book January 2004
A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows journal June 2013
Rethinking data management for big data scientific workflows conference October 2013
A bridging model for parallel computation journal August 1990

Cited By (2)

The role of machine learning in scientific workflows journal May 2019
Symbolic regression in materials science journal June 2019

Similar Records

A characterization of workflow management systems for extreme-scale applications
Journal Article · Thu Feb 16 00:00:00 EST 2017 · Future Generations Computer Systems · OSTI ID:1265426

Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows
Technical Report · Fri May 19 00:00:00 EDT 2023 · OSTI ID:1265426

Panorama 360 (Final Report)
Technical Report · Thu Jun 22 00:00:00 EDT 2023 · OSTI ID:1265426