skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Initial Proposal for MPI 3.0 Error Handling

Abstract

The MPI 2 spec contains error handling and notification mechanisms that have a number of limitations from the point of view of application fault tolerance: (1) The specification makes no demands on MPI to survive failures. Although MPI implementers are encouraged to 'circumscribe the impact of an error, so that normal processing can continue after an error handler was invoked', nothing more is specified in the standard. In particular, the defined MPI error classes are used only to clarify to the user the source of the error and do not describe the MPI functionality that is not available as a result of the error. (2) All errors must somehow be associated with some specific MPI call. As such, (A) It is difficult for MPI to notify users of failures in asynchronous calls, such as an MPI{_}Rsend call, which may return immediately after the message data is sent along the wire but before it is successfully delivered; (B) There is no provision for asynchronous error notification regarding errors that will affect future calls, such as notifying process p of the failure of process q before p tries to communicate with q. (3) There is no description of when error notification willmore » happen relative to the occurrence of the error. In particular, the specification does not state whether an error that would cause MPI functions to return an error code under the MPI{_}ERRORS{_}RETURN error handler would cause a user-defined error handler to be called during the same MPI function or at some earlier or later point in time. (4) Although MPI makes it possible for libraries to define their own error classes and invoke application error handlers, it is not possible for the application to define new error notification patterns either within or across processes. This means that it is not possible for one application process to ask to be informed of errors on other processes or for the application to be informed of specific classes of errors.« less

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
945669
Report Number(s):
LLNL-TR-405242
TRN: US200904%%120
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; PROCESSING; SPECIFICATIONS; TOLERANCE

Citation Formats

Bronevetsky, G. Initial Proposal for MPI 3.0 Error Handling. United States: N. p., 2008. Web. doi:10.2172/945669.
Bronevetsky, G. Initial Proposal for MPI 3.0 Error Handling. United States. https://doi.org/10.2172/945669
Bronevetsky, G. 2008. "Initial Proposal for MPI 3.0 Error Handling". United States. https://doi.org/10.2172/945669. https://www.osti.gov/servlets/purl/945669.
@article{osti_945669,
title = {Initial Proposal for MPI 3.0 Error Handling},
author = {Bronevetsky, G},
abstractNote = {The MPI 2 spec contains error handling and notification mechanisms that have a number of limitations from the point of view of application fault tolerance: (1) The specification makes no demands on MPI to survive failures. Although MPI implementers are encouraged to 'circumscribe the impact of an error, so that normal processing can continue after an error handler was invoked', nothing more is specified in the standard. In particular, the defined MPI error classes are used only to clarify to the user the source of the error and do not describe the MPI functionality that is not available as a result of the error. (2) All errors must somehow be associated with some specific MPI call. As such, (A) It is difficult for MPI to notify users of failures in asynchronous calls, such as an MPI{_}Rsend call, which may return immediately after the message data is sent along the wire but before it is successfully delivered; (B) There is no provision for asynchronous error notification regarding errors that will affect future calls, such as notifying process p of the failure of process q before p tries to communicate with q. (3) There is no description of when error notification will happen relative to the occurrence of the error. In particular, the specification does not state whether an error that would cause MPI functions to return an error code under the MPI{_}ERRORS{_}RETURN error handler would cause a user-defined error handler to be called during the same MPI function or at some earlier or later point in time. (4) Although MPI makes it possible for libraries to define their own error classes and invoke application error handlers, it is not possible for the application to define new error notification patterns either within or across processes. This means that it is not possible for one application process to ask to be informed of errors on other processes or for the application to be informed of specific classes of errors.},
doi = {10.2172/945669},
url = {https://www.osti.gov/biblio/945669}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jul 07 00:00:00 EDT 2008},
month = {Mon Jul 07 00:00:00 EDT 2008}
}