skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.

Authors:
 [1];  [2];  [2];  [3];  [4]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  2. Forschungszentrum Julich (Germany). Julich Supercomputing Centre (JSC)
  3. RWTH Aachen Univ. (Germany)
  4. Technical Univ. of Darmstadt (Germany)
Publication Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1305834
Report Number(s):
LLNL-JRNL-663039
Journal ID: ISSN 2329-4949
Grant/Contract Number:  
AC52-07NA27344; GSC 111; VH-NG-118
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
ACM Transactions on Parallel Computing
Additional Journal Information:
Journal Volume: 3; Journal Issue: 2; Journal ID: ISSN 2329-4949
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Performance analysis; cause analysis; load imbalance; event tracing; MPI; OpenMP

Citation Formats

Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. United States: N. p., 2016. Web. doi:10.1145/2934661.
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, & Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. United States. https://doi.org/10.1145/2934661
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. 2016. "Identifying the Root Causes of Wait States in Large-Scale Parallel Applications". United States. https://doi.org/10.1145/2934661. https://www.osti.gov/servlets/purl/1305834.
@article{osti_1305834,
title = {Identifying the Root Causes of Wait States in Large-Scale Parallel Applications},
author = {Böhme, David and Geimer, Markus and Arnold, Lukas and Voigtlaender, Felix and Wolf, Felix},
abstractNote = {Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.},
doi = {10.1145/2934661},
url = {https://www.osti.gov/biblio/1305834}, journal = {ACM Transactions on Parallel Computing},
issn = {2329-4949},
number = 2,
volume = 3,
place = {United States},
year = {Wed Jul 20 00:00:00 EDT 2016},
month = {Wed Jul 20 00:00:00 EDT 2016}
}

Works referenced in this record:

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
conference, November 2010

  • Tallent, Nathan R.; Adhianto, Laksono; Mellor-Crummey, John M.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.47

Predictive analysis of a wavefront application using LogGP
conference, January 1999

  • Sundaram-Stukel, David; Vernon, Mary K.
  • Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99
  • https://doi.org/10.1145/301104.301117

Workshop on wide area networks and high performance computing
book, January 1999


Bubble acceleration of electrons with few-cycle laser pulses
journal, September 2006


Extracting Critical Path Graphs from MPI Applications
conference, September 2005


On the Performance of Transparent MPI Piggyback Messages
book, January 2008


SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications
book, January 2008


Scalable Critical-Path Based Performance Analysis
conference, May 2012

  • Bohme, David; Wolf, Felix; de Supinski, Bronis R.
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • https://doi.org/10.1109/IPDPS.2012.120

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
conference, September 2010


Space-efficient time-series call-path profiling of parallel applications
conference, January 2009


Scalable timestamp synchronization for event traces of message-passing applications
journal, December 2009


3D simulations of surface harmonic generation with few-cycle laser pulses
journal, July 2007


An online computation of critical path profiling
conference, January 1996


Waiting time analysis and performance visualization in Carnival
conference, January 1996


Scalable load-balance measurement for SPMD codes
conference, November 2008

  • Gamblin, Todd; de Supinski, Bronis R.; Schulz, Martin
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2008.5222553

Understanding the formation of wait states in applications with one-sided communication
conference, January 2013


Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP
journal, July 2006

  • Hayes, John C.; Norman, Michael L.; Fiedler, Robert A.
  • The Astrophysical Journal Supplement Series, Vol. 165, Issue 1
  • https://doi.org/10.1086/504594

Using cause-effect analysis to understand the performance of distributed programs
conference, January 1998


On-Line Performance Modeling for MPI Applications
book, January 2008


A scalable tool architecture for diagnosing wait states in massively parallel applications
journal, July 2009


HPCTOOLKIT: tools for performance analysis of optimized parallel programs
journal, January 2009


Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset
conference, April 2010

  • Wylie, Brian J. N.; Bohme, David; Mohr, Bernd
  • 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • https://doi.org/10.1109/IPDPSW.2010.5470816

A methodology towards automatic performance analysis of parallel applications
journal, February 2004


Predictive analysis of a wavefront application using LogGP
journal, August 1999


Works referencing / citing this record:

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors
conference, June 2018

  • Wei, Lai; Mellor-Crummey, John
  • ICS '18: 2018 International Conference on Supercomputing, Proceedings of the 2018 International Conference on Supercomputing
  • https://doi.org/10.1145/3205289.3205308