Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Transparent redundant computing with MPI.

Conference ·
OSTI ID:1011627
Extreme-scale parallel systems will require alternative methods for applications to maintain current levels of uninterrupted execution. Redundant computation is one approach to consider, if the benefits of increased resiliency outweigh the cost of consuming additional resources. We describe a transparent redundancy approach for MPI applications and detail two different implementations that provide the ability to tolerate a range of failure scenarios, including loss of application processes and connectivity.We compare these two approaches and show performance results from micro-benchmarks that bound worst-case message passing performance degradation.We propose several enhancements that could lower the overhead of providing resiliency through redundancy.
Research Organization:
Sandia National Laboratories
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1011627
Report Number(s):
SAND2010-2571C
Country of Publication:
United States
Language:
English

Similar Records

Redundant Execution of HPC Applications with MR-MPI
Conference · Fri Dec 31 23:00:00 EST 2010 · OSTI ID:1081697

File I/O for MPI Applications in Redundant Execution Scenarios
Conference · Sat Dec 31 23:00:00 EST 2011 · OSTI ID:1037032

Adding Fault Tolerance to NPB Benchmarks Using ULFM
Conference · Thu Dec 31 23:00:00 EST 2015 · OSTI ID:1271876