Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [3];  [4];  [5];  [3];  [3];  [1];  [6];  [5];  [5];  [5];  [5];  [3];  [1]
  1. Univ. of Southern California, Los Angeles, CA (United States)
  2. Rensselaer Polytechnic Inst., Troy, NY (United States)
  3. Univ. of North Carolina, Chapel Hill, NC (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  5. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  6. Univ. of Southern California, Los Angeles, CA (United States); AGH - Univ. of Science and Technology, Krakow (Poland)
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Spallation Neutron Source
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
Grant/Contract Number:
AC05-00OR22725; SC0012636
OSTI ID:
1265426
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (58)

Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions: CONTROLLING FAIRNESS AND TASK GRANULARITY IN WORKFLOWS
  • Ferreira da Silva, Rafael; Glatard, Tristan; Desprez, Frédéric
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 14 https://doi.org/10.1002/cpe.3303
journal May 2014
Scalable molecular dynamics with NAMD journal January 2005
Workflow Exception Patterns book January 2006
A Trace-Based Investigation Of The Characteristics Of Grid Workflows book January 2008
Workflow Monitoring and Analysis Tool for ASKALON book October 2008
SCALEA-G: A Unified Monitoring and Performance Analysis System for the Grid book January 2004
A Science-Gateway Workload Archive to Study Pilot Jobs, User Activity, Bag of Tasks, Task Sub-steps, and Workflow Executions book January 2013
A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows journal June 2013
Sassena — X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers journal July 2012
The Grid Workloads Archive journal July 2008
Workflows and e-Science: An overview of workflow system features and capabilities journal May 2009
Characterizing and profiling scientific workflows journal March 2013
Characterizing workflow-based activity on a production e-infrastructure using provenance data journal October 2013
Self-healing of workflow activity incidents on distributed computing infrastructures journal October 2013
Pegasus, a workflow management system for science automation journal May 2015
Mantid—Data analysis and visualization package for neutron scattering and μ SR experiments
  • Arnold, O.; Bilheux, J. C.; Borreguero, J. M.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 764 https://doi.org/10.1016/j.nima.2014.07.029
journal November 2014
The Spallation Neutron Source in Oak Ridge: A powerful tool for materials research journal November 2006
Rethinking data management for big data scientific workflows conference October 2013
Workload Characterization for Capacity Planning and Performance Management in IaaS Cloud conference October 2012
Data Management Challenges of Data-Intensive Scientific Workflows
  • Deelman, Ewa; Chervenak, Ann
  • 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID) https://doi.org/10.1109/CCGRID.2008.24
conference May 2008
A Lightweight Middleware Monitor for Distributed Scientific Workflows
  • Cruz, Sergio Manuel Serra da; Silva, Fabricio Nogueira da; Gadelha Jr., Luiz M. R.
  • 2008 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID) https://doi.org/10.1109/CCGRID.2008.89
conference May 2008
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems conference May 2010
Practical Resource Monitoring for Robust High Throughput Computing conference September 2015
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example conference December 2006
Application-Level Resource Provisioning on the Grid conference December 2006
Dynamic Cloud provisioning for scientific Grid workflows conference October 2010
Online Fault and Anomaly Detection for Large-Scale Scientific Workflows
  • Samak, Taghrid; Gunter, Dan; Goode, Monte
  • Communication (HPCC), 2011 IEEE International Conference on High Performance Computing and Communications https://doi.org/10.1109/HPCC.2011.55
conference September 2011
Workload characterization on a production Hadoop cluster: A case study on Taobao conference November 2012
An introductory exascale feasibility study for FFTs and multigrid conference April 2010
The vision of autonomic computing journal January 2003
Enabling persistent queries for cross-aggregate performance monitoring journal May 2014
Grid Computing Workloads journal March 2011
On the role of burst buffers in leadership-class storage systems conference April 2012
Scalable Time Warp on Blue Gene Supercomputers
  • Bauer Jr., David W.; Carothers, Christopher D.; Holder, Akintayo
  • 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation (PADS) https://doi.org/10.1109/PADS.2009.21
conference June 2009
Aspen: A domain specific language for performance modeling
  • Spafford, Kyle L.; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.20
conference November 2012
Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
  • Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.56
conference November 2012
Characterization of scientific workflows conference November 2008
A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions conference November 2014
On deciding between conservative and optimistic approaches on massively parallel platforms conference December 2010
Community Resources for Enabling Research in Distributed Scientific Workflows conference October 2014
LogP: towards a realistic model of parallel computation journal July 1993
Using simulation to design extremescale applications and architectures: programming model exploration journal March 2011
The structural simulation toolkit journal March 2011
Auto-scaling to minimize cost and meet application deadlines in cloud workflows conference January 2011
Failure prediction and localization in large scientific workflows conference January 2011
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
  • Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
  • Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95 https://doi.org/10.1145/215399.215427
conference January 1995
On the communication complexity of 3D FFTs and its implications for Exascale conference January 2012
Warp speed: executing time warp on 1,966,080 cores
  • Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
  • Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13 https://doi.org/10.1145/2486092.2486134
conference January 2013
Toward fine-grained online task characteristics estimation in scientific workflows conference January 2013
Evaluating I/O aware network management for scientific workflows on networked clouds conference January 2013
The macroscopic behavior of the TCP congestion avoidance algorithm journal July 1997
COMPASS: A Framework for Automated Performance Modeling and Prediction conference January 2015
Efficient optimistic parallel simulations using reverse computation journal July 1999
A bridging model for parallel computation journal August 1990
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005
Measuring TeraGrid: workload characterization for a high-performance computing federation journal February 2011
Modeling synthetic aperture radar computation with Aspen journal July 2013
Instantiating a Global Network Measurement Framework report December 2008

Cited By (2)

The role of machine learning in scientific workflows journal May 2019
Symbolic regression in materials science journal June 2019

Similar Records

A characterization of workflow management systems for extreme-scale applications
Journal Article · 2017 · Future Generations Computer Systems · OSTI ID:1408072

Extreme-scale workflows: A perspective from the JLESC international community
Journal Article · 2024 · Future Generations Computer Systems · OSTI ID:2440426

Panorama 360 (Final Report)
Technical Report · 2023 · OSTI ID:1846090