On noise and the performance benefit of nonblocking collectives

Widener, Patrick M.; Levy, Scott; Ferreira, Kurt B.; Hoefler, Torsten

doi:10.1177/1094342015611952

On noise and the performance benefit of nonblocking collectives

Journal Article · Mon Nov 02 00:00:00 EST 2015 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/1094342015611952· OSTI ID:1257977

Widener, Patrick M. ^[1]; Levy, Scott ^[2]; Ferreira, Kurt B. ^[1]; Hoefler, Torsten ^[3]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Univ. of New Mexico, Albuquerque, NM (United States)
ETH Zurich (Switzerland)

Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not been investigated in detail. In this paper, we use a validated simulation approach to explore the noise mitigation effects of idealized nonblocking collectives in workloads where these collectives are a major contributor to total execution time. In conclusion, although nonblocking collectives are unlikely to provide significant noise mitigation to applications in the low-OS-noise environments expected in next-generation HPC systems, we show that they can potentially improve application runtime with respect to other noise types.

View Accepted Manuscript (DOE)

Research Organization:: Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1257977

Report Number(s):: SAND--2014-19529J; 641904

Journal Information:: International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 1 Vol. 30; ISSN 1094-3420

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

References (21)

Designing and implementing lightweight kernels for capability computing Riesen, Rolf; Brightwell, Ron; Bridges, Patrick G. Concurrency and Computation: Practice and Experience, Vol. 21, Issue 6 https://doi.org/10.1002/cpe.1361	journal	April 2009
Fast Parallel Algorithms for Short-Range Molecular Dynamics Plimpton, Steve Journal of Computational Physics, Vol. 117, Issue 1 https://doi.org/10.1006/jcph.1995.1039	journal	March 1995
A Case for Standard Non-blocking Collective Operations Hoefler, Torsten; Kambadur, Prabhanjan; Graham, Richard L. Recent Advances in Parallel Virtual Machine and Message Passing Interface https://doi.org/10.1007/978-3-540-75416-9_22	book	January 2007
Benchmarking the effects of operating system interference on extreme-scale parallel machines Beckman, Pete; Iskra, Kamil; Yoshii, Kazutomo Cluster Computing, Vol. 11, Issue 1 https://doi.org/10.1007/s10586-007-0047-2	journal	January 2008
BoomerAMG: A parallel algebraic multigrid solver and preconditioner Henson, Van Emden; Yang, Ulrike Meier Applied Numerical Mathematics, Vol. 41, Issue 1 https://doi.org/10.1016/S0168-9274(01)00115-5	journal	April 2002
A higher order estimate of the optimum checkpoint interval for restart dumps Daly, J. T. Future Generation Computer Systems, Vol. 22, Issue 3, p. 303-312 https://doi.org/10.1016/j.future.2004.11.016	journal	February 2006
Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm Ghysels, P.; Vanroose, W. Parallel Computing, Vol. 40, Issue 7 https://doi.org/10.1016/j.parco.2013.06.001	journal	July 2014
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications Bronevetsky, Greg 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2009 International Symposium on Code Generation and Optimization https://doi.org/10.1109/CGO.2009.32	conference	March 2009
Characterizing the Performance of Big Memory on Blue Gene Linux Yoshii, Kazutomo; Iskra, Kamil; Naik, Harish 2009 International Conference on Parallel Processing Workshops (ICPPW) https://doi.org/10.1109/ICPPW.2009.35	conference	September 2009
Characterizing application sensitivity to OS interference using kernel-level noise injection Ferreira, Kurt B.; Bridges, Patrick; Brightwell, Ron 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5219920	conference	November 2008
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.12	conference	November 2010
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK Giampapa, Mark; Gooding, Thomas; Inglett, Todd 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.22	conference	November 2010
Understanding the Effects of Communication and Coordination on Checkpointing at Scale Ferreira, Kurt B.; Widener, Patrick; Levy, Scott SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.77	conference	November 2014
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q Petrini, Fabrizio; Kerbyson, Darren J.; Pakin, Scott Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03 https://doi.org/10.1145/1048935.1050204	conference	January 2003
Scalable communication protocols for dynamic sparse data exchange Hoefler, Torsten; Siebert, Christian; Lumsdaine, Andrew Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693476	conference	January 2010
LogP: towards a realistic model of parallel computation Culler, David; Karp, Richard; Patterson, David ACM SIGPLAN Notices, Vol. 28, Issue 7 https://doi.org/10.1145/173284.155333	journal	July 1993
LogGOPSim: simulating large-scale applications in the LogGOPS model Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851564	conference	January 2010
Time, clocks, and the ordering of events in a distributed system Lamport, Leslie Communications of the ACM, Vol. 21, Issue 7 https://doi.org/10.1145/359545.359563	journal	July 1978
Optimization of Collective Communication Operations in MPICH Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William The International Journal of High Performance Computing Applications, Vol. 19, Issue 1 https://doi.org/10.1177/1094342005051521	journal	February 2005
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications Brightwell, Ron; Riesen, Rolf; Underwood, Keith D. The International Journal of High Performance Computing Applications, Vol. 19, Issue 2 https://doi.org/10.1177/1094342005054257	journal	May 2005
LULESH Programming Model and Performance Ports Overview Karlin, I. https://doi.org/10.2172/1059462	report	December 2012

Cited By (1)

The unexpected virtue of almost: Exploiting MPI collective operations to approximately coordinate checkpoints Levy, Scott; Ferreira, Kurt B.; Widener, Patrick Concurrency and Computation: Practice and Experience, Vol. 32, Issue 3 https://doi.org/10.1002/cpe.4890	journal	September 2018

Similar Records

Mini-Ckpts: Surviving OS Failures in Persistent Memory

Conference · Thu Dec 31 23:00:00 EST 2015 · OSTI ID:1260089

Are nonblocking networks really needed for high-end-computing workloads?.

Conference · Mon Sep 01 00:00:00 EDT 2008 · OSTI ID:1001612

Investigating Operating System Noise in Extreme-Scale High-Performance Computing Systems using Simulation

Conference · Mon Dec 31 23:00:00 EST 2012 · OSTI ID:1073673

Related Subjects

97 MATHEMATICS AND COMPUTING
HPC
checkpointing
collectives
nonblocking
resilience
simulation

On noise and the performance benefit of nonblocking collectives

Citation Formats

References (21)

Cited By (1)

Similar Records

Related Subjects