Transparent redundant computing with MPI.
Conference
·
OSTI ID:1011627
Extreme-scale parallel systems will require alternative methods for applications to maintain current levels of uninterrupted execution. Redundant computation is one approach to consider, if the benefits of increased resiliency outweigh the cost of consuming additional resources. We describe a transparent redundancy approach for MPI applications and detail two different implementations that provide the ability to tolerate a range of failure scenarios, including loss of application processes and connectivity.We compare these two approaches and show performance results from micro-benchmarks that bound worst-case message passing performance degradation.We propose several enhancements that could lower the overhead of providing resiliency through redundancy.
- Research Organization:
- Sandia National Laboratories
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1011627
- Report Number(s):
- SAND2010-2571C
- Country of Publication:
- United States
- Language:
- English
Similar Records
Redundant Execution of HPC Applications with MR-MPI
File I/O for MPI Applications in Redundant Execution Scenarios
Adding Fault Tolerance to NPB Benchmarks Using ULFM
Conference
·
Fri Dec 31 23:00:00 EST 2010
·
OSTI ID:1081697
File I/O for MPI Applications in Redundant Execution Scenarios
Conference
·
Sat Dec 31 23:00:00 EST 2011
·
OSTI ID:1037032
Adding Fault Tolerance to NPB Benchmarks Using ULFM
Conference
·
Thu Dec 31 23:00:00 EST 2015
·
OSTI ID:1271876