Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Evolving the message-passing model via an object-oriented fault-tolerant transport layer.

Conference ·

Abstract not provided.

Research Organization:
Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1364784
Report Number(s):
SAND2015-4738C; 625673
Country of Publication:
United States
Language:
English

References (13)

Fault Tolerance in Message Passing Interface Programs journal August 2004
The Byzantine Generals Problem journal July 1982
Evaluating User-Level Fault Tolerance for MPI Applications conference January 2014
Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery conference August 2009
Starfish: fault-tolerant dynamic MPI programs on clusters of workstations conference January 1999
A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI book January 2011
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18
conference November 2010
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
Fault-tolerant communication runtime support for data-centric programming models conference December 2010
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications
  • Guermouche, Amina; Ropars, Thomas; Brunet, Elisabeth
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.95
conference May 2011
Toward Local Failure Local Recovery Resilience Model using MPI-ULFM conference January 2014
Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales
  • Gamell, Marc; Katz, Daniel S.; Kolla, Hemanth
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.78
conference November 2014
A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE book January 2011

Similar Records

Evolving the message passing programming model via a fault-tolerant object-oriented transport layer.
Conference · Sat Feb 28 23:00:00 EST 2015 · OSTI ID:1331635

rMPI : increasing fault resiliency in a message-passing environment.
Conference · Thu Apr 01 00:00:00 EDT 2010 · OSTI ID:1002112

Fault tolerant programming models.
Conference · Thu Feb 28 23:00:00 EST 2013 · OSTI ID:1657450

Related Subjects