Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
Abstract
Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.
- Authors:
-
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Forschungszentrum Julich (Germany). Julich Supercomputing Centre (JSC)
- RWTH Aachen Univ. (Germany)
- Technical Univ. of Darmstadt (Germany)
- Publication Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1305834
- Report Number(s):
- LLNL-JRNL-663039
Journal ID: ISSN 2329-4949
- Grant/Contract Number:
- AC52-07NA27344; GSC 111; VH-NG-118
- Resource Type:
- Accepted Manuscript
- Journal Name:
- ACM Transactions on Parallel Computing
- Additional Journal Information:
- Journal Volume: 3; Journal Issue: 2; Journal ID: ISSN 2329-4949
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Performance analysis; cause analysis; load imbalance; event tracing; MPI; OpenMP
Citation Formats
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. United States: N. p., 2016.
Web. doi:10.1145/2934661.
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, & Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications. United States. https://doi.org/10.1145/2934661
Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Wed .
"Identifying the Root Causes of Wait States in Large-Scale Parallel Applications". United States. https://doi.org/10.1145/2934661. https://www.osti.gov/servlets/purl/1305834.
@article{osti_1305834,
title = {Identifying the Root Causes of Wait States in Large-Scale Parallel Applications},
author = {Böhme, David and Geimer, Markus and Arnold, Lukas and Voigtlaender, Felix and Wolf, Felix},
abstractNote = {Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.},
doi = {10.1145/2934661},
journal = {ACM Transactions on Parallel Computing},
number = 2,
volume = 3,
place = {United States},
year = {Wed Jul 20 00:00:00 EDT 2016},
month = {Wed Jul 20 00:00:00 EDT 2016}
}
Works referenced in this record:
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
conference, November 2010
- Tallent, Nathan R.; Adhianto, Laksono; Mellor-Crummey, John M.
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Predictive analysis of a wavefront application using LogGP
conference, January 1999
- Sundaram-Stukel, David; Vernon, Mary K.
- Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99
Workshop on wide area networks and high performance computing
book, January 1999
- Cooperman, G.; Jessen, E.; Michler, G.
- Lecture Notes in Control and Information Sciences
Bubble acceleration of electrons with few-cycle laser pulses
journal, September 2006
- Geissler, Michael; Schreiber, Jörg; Meyer-ter-Vehn, Jürgen
- New Journal of Physics, Vol. 8, Issue 9
Extracting Critical Path Graphs from MPI Applications
conference, September 2005
- Schulz, Martin
- 2005 IEEE International Conference on Cluster Computing
On the Performance of Transparent MPI Piggyback Messages
book, January 2008
- Schulz, Martin; Bronevetsky, Greg; de Supinski, Bronis R.
- Recent Advances in Parallel Virtual Machine and Message Passing Interface
SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications
book, January 2008
- Szebenyi, Zoltán; Wylie, Brian J. N.; Wolf, Felix
- Performance Evaluation: Metrics, Models and Benchmarks
Scalable Critical-Path Based Performance Analysis
conference, May 2012
- Bohme, David; Wolf, Felix; de Supinski, Bronis R.
- 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
conference, September 2010
- Bohme, David; Geimer, Markus; Wolf, Felix
- 2010 39th International Conference on Parallel Processing (ICPP)
Space-efficient time-series call-path profiling of parallel applications
conference, January 2009
- Szebenyi, Zoltán; Wolf, Felix; Wylie, Brian J. N.
- Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
Scalable timestamp synchronization for event traces of message-passing applications
journal, December 2009
- Becker, Daniel; Rabenseifner, Rolf; Wolf, Felix
- Parallel Computing, Vol. 35, Issue 12
3D simulations of surface harmonic generation with few-cycle laser pulses
journal, July 2007
- Geissler, M.; Rykovanov, S.; Schreiber, J.
- New Journal of Physics, Vol. 9, Issue 7
An online computation of critical path profiling
conference, January 1996
- Hollingsworth, Jeffrey K.
- Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96
Waiting time analysis and performance visualization in Carnival
conference, January 1996
- Meira, Wagner; LeBlanc, Thomas J.; Poulos, Alexandros
- Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96
Scalable load-balance measurement for SPMD codes
conference, November 2008
- Gamblin, Todd; de Supinski, Bronis R.; Schulz, Martin
- 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
Understanding the formation of wait states in applications with one-sided communication
conference, January 2013
- Hermanns, Marc-André; Miklosch, Manfred; Böhme, David
- Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13
Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP
journal, July 2006
- Hayes, John C.; Norman, Michael L.; Fiedler, Robert A.
- The Astrophysical Journal Supplement Series, Vol. 165, Issue 1
Using cause-effect analysis to understand the performance of distributed programs
conference, January 1998
- Meira, Wagner; LeBlanc, Thomas J.; Almeida, Virgílio A. F.
- Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '98
On-Line Performance Modeling for MPI Applications
book, January 2008
- Morajko, Oleg; Morajko, Anna; Margalef, Tomàs
- Lecture Notes in Computer Science
A scalable tool architecture for diagnosing wait states in massively parallel applications
journal, July 2009
- Geimer, Markus; Wolf, Felix; Wylie, Brian J. N.
- Parallel Computing, Vol. 35, Issue 7
HPCTOOLKIT: tools for performance analysis of optimized parallel programs
journal, January 2009
- Adhianto, L.; Banerjee, S.; Fagan, M.
- Concurrency and Computation: Practice and Experience
Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset
conference, April 2010
- Wylie, Brian J. N.; Bohme, David; Mohr, Bernd
- 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
A methodology towards automatic performance analysis of parallel applications
journal, February 2004
- Calzarossa, Maria; Massari, Luisa; Tessera, Daniele
- Parallel Computing, Vol. 30, Issue 2
Predictive analysis of a wavefront application using LogGP
journal, August 1999
- Sundaram-Stukel, David; Vernon, Mary K.
- ACM SIGPLAN Notices, Vol. 34, Issue 8
Works referencing / citing this record:
Automated Analysis of Time Series Data to Understand Parallel Program Behaviors
conference, June 2018
- Wei, Lai; Mellor-Crummey, John
- ICS '18: 2018 International Conference on Supercomputing, Proceedings of the 2018 International Conference on Supercomputing