skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: rMPI

Software ·
OSTI ID:1231486

As high-performance computing (HPC) machines continue to grow in size, issues such as fault tolerance and reliability limit application scalability. Current techniques to ensure progress across faults, like checkpoint-restart, are unsuitable on their own for exascale machines due to the excessive overheads predicted to more than double an applications time to solution. An alternative mechanism to increase application reliability than just checkpoint-restart alone is redundant computation. The rMPl library enables portable and transparent redundant computation) that, at extreme scale, has significantly lower verhead then just checkpoint-restart on its own.

Short Name / Acronym:
RMPI beta version; 002684MLTPL00
Version:
00
Programming Language(s):
Medium: X; OS: Any Unix-based sysytem; Compatibility: Multiplatform
Research Organization:
Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
Sponsoring Organization:
USDOE
Contributing Organization:
Kurt B. Ferreira,
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1231486
Country of Origin:
United States

Related Subjects