skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Journal Article · · ACM Transactions on Parallel Computing
DOI:https://doi.org/10.1145/2934661· OSTI ID:1305834
 [1];  [2];  [2];  [3];  [4]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  2. Forschungszentrum Julich (Germany). Julich Supercomputing Centre (JSC)
  3. RWTH Aachen Univ. (Germany)
  4. Technical Univ. of Darmstadt (Germany)

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC52-07NA27344; GSC 111; VH-NG-118
OSTI ID:
1305834
Report Number(s):
LLNL-JRNL-663039
Journal Information:
ACM Transactions on Parallel Computing, Vol. 3, Issue 2; ISSN 2329-4949
Publisher:
Association for Computing MachineryCopyright Statement
Country of Publication:
United States
Language:
English

References (24)

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
  • Tallent, Nathan R.; Adhianto, Laksono; Mellor-Crummey, John M.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.47
conference November 2010
Predictive analysis of a wavefront application using LogGP
  • Sundaram-Stukel, David; Vernon, Mary K.
  • Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99 https://doi.org/10.1145/301104.301117
conference January 1999
Workshop on wide area networks and high performance computing book January 1999
Bubble acceleration of electrons with few-cycle laser pulses journal September 2006
Extracting Critical Path Graphs from MPI Applications conference September 2005
On the Performance of Transparent MPI Piggyback Messages book January 2008
SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications book January 2008
Scalable Critical-Path Based Performance Analysis
  • Bohme, David; Wolf, Felix; de Supinski, Bronis R.
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.120
conference May 2012
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications conference September 2010
Space-efficient time-series call-path profiling of parallel applications conference January 2009
Scalable timestamp synchronization for event traces of message-passing applications journal December 2009
3D simulations of surface harmonic generation with few-cycle laser pulses journal July 2007
An online computation of critical path profiling conference January 1996
Waiting time analysis and performance visualization in Carnival conference January 1996
Scalable load-balance measurement for SPMD codes
  • Gamblin, Todd; de Supinski, Bronis R.; Schulz, Martin
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5222553
conference November 2008
Understanding the formation of wait states in applications with one-sided communication conference January 2013
Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP
  • Hayes, John C.; Norman, Michael L.; Fiedler, Robert A.
  • The Astrophysical Journal Supplement Series, Vol. 165, Issue 1 https://doi.org/10.1086/504594
journal July 2006
Using cause-effect analysis to understand the performance of distributed programs conference January 1998
On-Line Performance Modeling for MPI Applications book January 2008
A scalable tool architecture for diagnosing wait states in massively parallel applications journal July 2009
HPCTOOLKIT: tools for performance analysis of optimized parallel programs journal January 2009
Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset
  • Wylie, Brian J. N.; Bohme, David; Mohr, Bernd
  • 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470816
conference April 2010
A methodology towards automatic performance analysis of parallel applications journal February 2004
Predictive analysis of a wavefront application using LogGP journal August 1999

Cited By (1)

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors
  • Wei, Lai; Mellor-Crummey, John
  • ICS '18: 2018 International Conference on Supercomputing, Proceedings of the 2018 International Conference on Supercomputing https://doi.org/10.1145/3205289.3205308
conference June 2018