DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Network support for system initiated checkpoints

Abstract

A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.

Inventors:
;
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1532128
Patent Number(s):
8856261
Application Number:
13/729,937
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B554331
Resource Type:
Patent
Resource Relation:
Patent File Date: 2012-12-28
Country of Publication:
United States
Language:
English

Citation Formats

Chen, Dong, and Heidelberger, Philip. Network support for system initiated checkpoints. United States: N. p., 2014. Web.
Chen, Dong, & Heidelberger, Philip. Network support for system initiated checkpoints. United States.
Chen, Dong, and Heidelberger, Philip. Tue . "Network support for system initiated checkpoints". United States. https://www.osti.gov/servlets/purl/1532128.
@article{osti_1532128,
title = {Network support for system initiated checkpoints},
author = {Chen, Dong and Heidelberger, Philip},
abstractNote = {A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {10}
}

Works referenced in this record:

Method of checkpointing parallel processes in execution within plurality of process domains
patent, October 2008


Selective preservation of network state during a checkpoint
patent-application, October 2008