skip to main content

Title: On noise and the performance benefit of nonblocking collectives

Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not been investigated in detail. In this paper, we use a validated simulation approach to explore the noise mitigation effects of idealized nonblocking collectives in workloads where these collectives are a major contributor to total execution time. In conclusion, although nonblocking collectives are unlikely to provide significant noise mitigation to applications in the low-OS-noise environments expected in next-generation HPC systems, we show that they can potentially improve application runtime with respect to other noise types.
 [1] ;  [2] ;  [1] ;  [3]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)
  3. ETH Zurich (Switzerland)
Publication Date:
OSTI Identifier:
Report Number(s):
Journal ID: ISSN 1094-3420; 641904
Grant/Contract Number:
Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 30; Journal Issue: 1; Journal ID: ISSN 1094-3420
Research Org:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA)
Country of Publication:
United States
97 MATHEMATICS AND COMPUTING; HPC; collectives; nonblocking; resilience; checkpointing; simulation