Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Böhme, David; Geimer, Markus; Arnold, Lukas; Voigtlaender, Felix; Wolf, Felix

doi:10.1145/2934661

Title: Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.

Authors:

Böhme, David ^[1]; Geimer, Markus ^[2]; Arnold, Lukas ^[2]; Voigtlaender, Felix ^[3]; Wolf, Felix ^[4]

Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Forschungszentrum Julich (Germany). Julich Supercomputing Centre (JSC)
RWTH Aachen Univ. (Germany)
Technical Univ. of Darmstadt (Germany)

Publication Date:: Wed Jul 20 00:00:00 EDT 2016

Research Org.:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Org.:: USDOE

OSTI Identifier:: 1305834

Report Number(s):: LLNL-JRNL-663039
Journal ID: ISSN 2329-4949

Grant/Contract Number:: AC52-07NA27344; GSC 111; VH-NG-118

Resource Type:: Accepted Manuscript

Journal Name:: ACM Transactions on Parallel Computing

Additional Journal Information:: Journal Volume: 3; Journal Issue: 2; Journal ID: ISSN 2329-4949

Publisher:: Association for Computing Machinery

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; Performance analysis; cause analysis; load imbalance; event tracing; MPI; OpenMP

Citation Formats


                    Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications.  United States: N. p., 2016. 
Web.  doi:10.1145/2934661.

Copy to clipboard


                    Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, & Wolf, Felix. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications.  United States.  https://doi.org/10.1145/2934661

Copy to clipboard


                    Böhme, David, Geimer, Markus, Arnold, Lukas, Voigtlaender, Felix, and Wolf, Felix. Wed .  
"Identifying the Root Causes of Wait States in Large-Scale Parallel Applications".  United States.  https://doi.org/10.1145/2934661.  https://www.osti.gov/servlets/purl/1305834.

Copy to clipboard


                    
@article{osti_1305834,

  title        = {Identifying the Root Causes of Wait States in Large-Scale Parallel Applications},

  author       = {Böhme, David and Geimer, Markus and Arnold, Lukas and Voigtlaender, Felix and Wolf, Felix},

  abstractNote = {Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.},

  doi          = {10.1145/2934661},

  journal      = {ACM Transactions on Parallel Computing},

  number       = 2,

  volume       = 3,

  place        = {United States},

  year         = {Wed Jul 20 00:00:00 EDT 2016},

  month        = {Wed Jul 20 00:00:00 EDT 2016}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1145/2934661

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
conference, November 2010

Tallent, Nathan R.; Adhianto, Laksono; Mellor-Crummey, John M.
2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2010.47

Predictive analysis of a wavefront application using LogGP
conference, January 1999

Sundaram-Stukel, David; Vernon, Mary K.
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99
DOI: 10.1145/301104.301117

Workshop on wide area networks and high performance computing
book, January 1999

Cooperman, G.; Jessen, E.; Michler, G.
Lecture Notes in Control and Information Sciences
DOI: 10.1007/BFb0110074

Bubble acceleration of electrons with few-cycle laser pulses
journal, September 2006

Geissler, Michael; Schreiber, Jörg; Meyer-ter-Vehn, Jürgen
New Journal of Physics, Vol. 8, Issue 9
DOI: 10.1088/1367-2630/8/9/186

Extracting Critical Path Graphs from MPI Applications
conference, September 2005

Schulz, Martin
2005 IEEE International Conference on Cluster Computing
DOI: 10.1109/CLUSTR.2005.347035

On the Performance of Transparent MPI Piggyback Messages
book, January 2008

Schulz, Martin; Bronevetsky, Greg; de Supinski, Bronis R.
Recent Advances in Parallel Virtual Machine and Message Passing Interface
DOI: 10.1007/978-3-540-87475-1_28

SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications
book, January 2008

Szebenyi, Zoltán; Wylie, Brian J. N.; Wolf, Felix
Performance Evaluation: Metrics, Models and Benchmarks
DOI: 10.1007/978-3-540-69814-2_8

Scalable Critical-Path Based Performance Analysis
conference, May 2012

Bohme, David; Wolf, Felix; de Supinski, Bronis R.
2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2012.120