Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

On noise and the performance benefit of nonblocking collectives

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [1];  [3]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)
  3. ETH Zurich (Switzerland)
Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not been investigated in detail. In this paper, we use a validated simulation approach to explore the noise mitigation effects of idealized nonblocking collectives in workloads where these collectives are a major contributor to total execution time. In conclusion, although nonblocking collectives are unlikely to provide significant noise mitigation to applications in the low-OS-noise environments expected in next-generation HPC systems, we show that they can potentially improve application runtime with respect to other noise types.
Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1257977
Report Number(s):
SAND--2014-19529J; 641904
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 1 Vol. 30; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (21)

Designing and implementing lightweight kernels for capability computing journal April 2009
Fast Parallel Algorithms for Short-Range Molecular Dynamics journal March 1995
A Case for Standard Non-blocking Collective Operations book January 2007
Benchmarking the effects of operating system interference on extreme-scale parallel machines journal January 2008
BoomerAMG: A parallel algebraic multigrid solver and preconditioner journal April 2002
A higher order estimate of the optimum checkpoint interval for restart dumps journal February 2006
Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm journal July 2014
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications
  • Bronevetsky, Greg
  • 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2009 International Symposium on Code Generation and Optimization https://doi.org/10.1109/CGO.2009.32
conference March 2009
Characterizing the Performance of “Big Memory” on Blue Gene Linux conference September 2009
Characterizing application sensitivity to OS interference using kernel-level noise injection conference November 2008
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
  • Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.12
conference November 2010
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK
  • Giampapa, Mark; Gooding, Thomas; Inglett, Todd
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.22
conference November 2010
Understanding the Effects of Communication and Coordination on Checkpointing at Scale
  • Ferreira, Kurt B.; Widener, Patrick; Levy, Scott
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.77
conference November 2014
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q conference January 2003
Scalable communication protocols for dynamic sparse data exchange
  • Hoefler, Torsten; Siebert, Christian; Lumsdaine, Andrew
  • Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693476
conference January 2010
LogP: towards a realistic model of parallel computation journal July 1993
LogGOPSim: simulating large-scale applications in the LogGOPS model
  • Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851564
conference January 2010
Time, clocks, and the ordering of events in a distributed system journal July 1978
Optimization of Collective Communication Operations in MPICH journal February 2005
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications journal May 2005
LULESH Programming Model and Performance Ports Overview report December 2012

Cited By (1)

The unexpected virtue of almost: Exploiting MPI collective operations to approximately coordinate checkpoints journal September 2018

Similar Records

Mini-Ckpts: Surviving OS Failures in Persistent Memory
Conference · Thu Dec 31 23:00:00 EST 2015 · OSTI ID:1260089

Are nonblocking networks really needed for high-end-computing workloads?.
Conference · Mon Sep 01 00:00:00 EDT 2008 · OSTI ID:1001612

Investigating Operating System Noise in Extreme-Scale High-Performance Computing Systems using Simulation
Conference · Mon Dec 31 23:00:00 EST 2012 · OSTI ID:1073673