Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Böhme, David; Geimer, Markus; Arnold, Lukas; Voigtlaender, Felix; Wolf, Felix

doi:10.1145/2934661

Title: Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Journal Article · Wed Jul 20 00:00:00 EDT 2016 · ACM Transactions on Parallel Computing

DOI:https://doi.org/10.1145/2934661· OSTI ID:1305834

Böhme, David ^[1]; Geimer, Markus ^[2]; Arnold, Lukas ^[2]; Voigtlaender, Felix ^[3]; Wolf, Felix ^[4]

Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Forschungszentrum Julich (Germany). Julich Supercomputing Centre (JSC)
RWTH Aachen Univ. (Germany)
Technical Univ. of Darmstadt (Germany)

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE

Grant/Contract Number:: AC52-07NA27344; GSC 111; VH-NG-118

OSTI ID:: 1305834

Report Number(s):: LLNL-JRNL-663039

Journal Information:: ACM Transactions on Parallel Computing, Vol. 3, Issue 2; ISSN 2329-4949

Publisher:: Association for Computing MachineryCopyright Statement

Country of Publication:: United States

Language:: English

References (24)

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles Tallent, Nathan R.; Adhianto, Laksono; Mellor-Crummey, John M. 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.47	conference	November 2010
Predictive analysis of a wavefront application using LogGP Sundaram-Stukel, David; Vernon, Mary K. Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '99 https://doi.org/10.1145/301104.301117	conference	January 1999
Workshop on wide area networks and high performance computing Cooperman, G.; Jessen, E.; Michler, G. Lecture Notes in Control and Information Sciences https://doi.org/10.1007/BFb0110074	book	January 1999
Bubble acceleration of electrons with few-cycle laser pulses Geissler, Michael; Schreiber, Jörg; Meyer-ter-Vehn, Jürgen New Journal of Physics, Vol. 8, Issue 9 https://doi.org/10.1088/1367-2630/8/9/186	journal	September 2006
Extracting Critical Path Graphs from MPI Applications Schulz, Martin 2005 IEEE International Conference on Cluster Computing https://doi.org/10.1109/CLUSTR.2005.347035	conference	September 2005
On the Performance of Transparent MPI Piggyback Messages Schulz, Martin; Bronevetsky, Greg; de Supinski, Bronis R. Recent Advances in Parallel Virtual Machine and Message Passing Interface https://doi.org/10.1007/978-3-540-87475-1_28	book	January 2008
SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications Szebenyi, Zoltán; Wylie, Brian J. N.; Wolf, Felix Performance Evaluation: Metrics, Models and Benchmarks https://doi.org/10.1007/978-3-540-69814-2_8	book	January 2008
Scalable Critical-Path Based Performance Analysis Bohme, David; Wolf, Felix; de Supinski, Bronis R. 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.120	conference	May 2012
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications Bohme, David; Geimer, Markus; Wolf, Felix 2010 39th International Conference on Parallel Processing (ICPP) https://doi.org/10.1109/ICPP.2010.18	conference	September 2010
Space-efficient time-series call-path profiling of parallel applications Szebenyi, Zoltán; Wolf, Felix; Wylie, Brian J. N. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09 https://doi.org/10.1145/1654059.1654097	conference	January 2009
Scalable timestamp synchronization for event traces of message-passing applications Becker, Daniel; Rabenseifner, Rolf; Wolf, Felix Parallel Computing, Vol. 35, Issue 12 https://doi.org/10.1016/j.parco.2008.12.012	journal	December 2009
3D simulations of surface harmonic generation with few-cycle laser pulses Geissler, M.; Rykovanov, S.; Schreiber, J. New Journal of Physics, Vol. 9, Issue 7 https://doi.org/10.1088/1367-2630/9/7/218	journal	July 2007
An online computation of critical path profiling Hollingsworth, Jeffrey K. Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96 https://doi.org/10.1145/238020.238024	conference	January 1996
Waiting time analysis and performance visualization in Carnival Meira, Wagner; LeBlanc, Thomas J.; Poulos, Alexandros Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '96 https://doi.org/10.1145/238020.238023	conference	January 1996
Scalable load-balance measurement for SPMD codes Gamblin, Todd; de Supinski, Bronis R.; Schulz, Martin 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5222553	conference	November 2008
Understanding the formation of wait states in applications with one-sided communication Hermanns, Marc-André; Miklosch, Manfred; Böhme, David Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13 https://doi.org/10.1145/2488551.2488569	conference	January 2013
Simulating Radiating and Magnetized Flows in Multiple Dimensions with ZEUS‐MP Hayes, John C.; Norman, Michael L.; Fiedler, Robert A. The Astrophysical Journal Supplement Series, Vol. 165, Issue 1 https://doi.org/10.1086/504594	journal	July 2006
Using cause-effect analysis to understand the performance of distributed programs Meira, Wagner; LeBlanc, Thomas J.; Almeida, Virgílio A. F. Proceedings of the SIGMETRICS symposium on Parallel and distributed tools - SPDT '98 https://doi.org/10.1145/281035.281046	conference	January 1998
On-Line Performance Modeling for MPI Applications Morajko, Oleg; Morajko, Anna; Margalef, Tomàs Lecture Notes in Computer Science https://doi.org/10.1007/978-3-540-85451-7_8	book	January 2008
A scalable tool architecture for diagnosing wait states in massively parallel applications Geimer, Markus; Wolf, Felix; Wylie, Brian J. N. Parallel Computing, Vol. 35, Issue 7 https://doi.org/10.1016/j.parco.2009.02.003	journal	July 2009
HPCTOOLKIT: tools for performance analysis of optimized parallel programs Adhianto, L.; Banerjee, S.; Fagan, M. Concurrency and Computation: Practice and Experience https://doi.org/10.1002/cpe.1553	journal	January 2009
Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset Wylie, Brian J. N.; Bohme, David; Mohr, Bernd 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010), 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2010.5470816	conference	April 2010
A methodology towards automatic performance analysis of parallel applications Calzarossa, Maria; Massari, Luisa; Tessera, Daniele Parallel Computing, Vol. 30, Issue 2 https://doi.org/10.1016/j.parco.2003.08.002	journal	February 2004
Predictive analysis of a wavefront application using LogGP Sundaram-Stukel, David; Vernon, Mary K. ACM SIGPLAN Notices, Vol. 34, Issue 8 https://doi.org/10.1145/329366.301117	journal	August 1999

Cited By (1)

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors Wei, Lai; Mellor-Crummey, John ICS '18: 2018 International Conference on Supercomputing, Proceedings of the 2018 International Conference on Supercomputing https://doi.org/10.1145/3205289.3205308	conference	June 2018

Similar Records

Computer Science Research Needs for Parallel Discrete Event Simulation (PDES)

Technical Report · Wed May 11 00:00:00 EDT 2022 · OSTI ID:1305834

Perumalla, Kalyan; Barnes, Peter; Bremer, Maximilian; +11 more

Processing communications events in parallel active messaging interface by awakening thread from wait state

Patent · Tue Oct 22 00:00:00 EDT 2013 · OSTI ID:1305834

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; +1 more

Multitarget tracking algorithm parallelization for distributed-memory computing systems

Conference · Tue Dec 31 00:00:00 EST 1996 · OSTI ID:1305834

Popp, R L; Pattipati, K R; Bar-Shalom, Y

Related Subjects

97 MATHEMATICS AND COMPUTING
Performance analysis
cause analysis
load imbalance
event tracing
MPI
OpenMP

Title: Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Citation Formats

References (24)

Cited By (1)

Similar Records

Related Subjects