skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.

Authors:
 [1];  [2];  [2];  [3];  [4]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  2. Forschungszentrum Julich (Germany). Julich Supercomputing Centre (JSC)
  3. RWTH Aachen Univ. (Germany)
  4. Technical Univ. of Darmstadt (Germany)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1305834
Report Number(s):
LLNL-JRNL-663039
Journal ID: ISSN 2329-4949
Grant/Contract Number:  
AC52-07NA27344; GSC 111; VH-NG-118
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Parallel Computing
Additional Journal Information:
Journal Volume: 3; Journal Issue: 2; Journal ID: ISSN 2329-4949
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Performance analysis; cause analysis; load imbalance; event tracing; MPI; OpenMP

Citation Formats

Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. United States: N. p., 2016. Web. doi:10.1145/2934661.
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, & Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. United States. doi:10.1145/2934661.
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Wed . "Identifying the Root Causes of Wait States in Large-Scale Parallel Applications". United States. doi:10.1145/2934661. https://www.osti.gov/servlets/purl/1305834.
@article{osti_1305834,
title = {Identifying the Root Causes of Wait States in Large-Scale Parallel Applications},
author = {Böhme, David and Geimer, Markus and Arnold, Lukas and Voigtlaender, Felix and Wolf, Felix},
abstractNote = {Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.},
doi = {10.1145/2934661},
journal = {ACM Transactions on Parallel Computing},
number = 2,
volume = 3,
place = {United States},
year = {2016},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
conference, November 2010

  • Tallent, Nathan R.; Adhianto, Laksono; Mellor-Crummey, John M.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.47

An online computation of critical path profiling
conference, January 1996

  • Hollingsworth, Jeffrey K.
  • Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96
  • DOI: 10.1145/238020.238024

Waiting time analysis and performance visualization in Carnival
conference, January 1996

  • Meira, Wagner; LeBlanc, Thomas J.; Poulos, Alexandros
  • Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96
  • DOI: 10.1145/238020.238023

Scalable load-balance measurement for SPMD codes
conference, November 2008

  • Gamblin, Todd; de Supinski, Bronis R.; Schulz, Martin
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2008.5222553

Understanding the formation of wait states in applications with one-sided communication
conference, January 2013

  • Hermanns, Marc-André; Miklosch, Manfred; Böhme, David
  • Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13
  • DOI: 10.1145/2488551.2488569

Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP
journal, July 2006

  • Hayes, John C.; Norman, Michael L.; Fiedler, Robert A.
  • The Astrophysical Journal Supplement Series, Vol. 165, Issue 1
  • DOI: 10.1086/504594

Using cause-effect analysis to understand the performance of distributed programs
conference, January 1998

  • Meira, Wagner; LeBlanc, Thomas J.; Almeida, Virgílio A. F.
  • Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '98
  • DOI: 10.1145/281035.281046

A scalable tool architecture for diagnosing wait states in massively parallel applications
journal, July 2009


Extracting Critical Path Graphs from MPI Applications
conference, September 2005


Scalable Critical-Path Based Performance Analysis
conference, May 2012

  • Bohme, David; Wolf, Felix; de Supinski, Bronis R.
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2012.120

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
conference, September 2010

  • Bohme, David; Geimer, Markus; Wolf, Felix
  • 2010 39th International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2010.18

Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset
conference, April 2010

  • Wylie, Brian J. N.; Bohme, David; Mohr, Bernd
  • 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • DOI: 10.1109/IPDPSW.2010.5470816

A methodology towards automatic performance analysis of parallel applications
journal, February 2004


Predictive analysis of a wavefront application using LogGP
journal, August 1999

  • Sundaram-Stukel, David; Vernon, Mary K.
  • ACM SIGPLAN Notices, Vol. 34, Issue 8
  • DOI: 10.1145/329366.301117

Space-efficient time-series call-path profiling of parallel applications
conference, January 2009

  • Szebenyi, Zoltán; Wolf, Felix; Wylie, Brian J. N.
  • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
  • DOI: 10.1145/1654059.1654097

Scalable timestamp synchronization for event traces of message-passing applications
journal, December 2009


HPCTOOLKIT: tools for performance analysis of optimized parallel programs
journal, January 2009

  • Adhianto, L.; Banerjee, S.; Fagan, M.
  • Concurrency and Computation: Practice and Experience
  • DOI: 10.1002/cpe.1553

3D simulations of surface harmonic generation with few-cycle laser pulses
journal, July 2007