Network support for system initiated checkpoints
Abstract
A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1532128
- Patent Number(s):
- 8856261
- Application Number:
- 13/729,937
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2012-12-28
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Chen, Dong, and Heidelberger, Philip. Network support for system initiated checkpoints. United States: N. p., 2014.
Web.
Chen, Dong, & Heidelberger, Philip. Network support for system initiated checkpoints. United States.
Chen, Dong, and Heidelberger, Philip. Tue .
"Network support for system initiated checkpoints". United States. https://www.osti.gov/servlets/purl/1532128.
@article{osti_1532128,
title = {Network support for system initiated checkpoints},
author = {Chen, Dong and Heidelberger, Philip},
abstractNote = {A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {10}
}
Works referenced in this record:
Method of checkpointing parallel processes in execution within plurality of process domains
patent, October 2008
- Janakiraman, Gopalakrishnan; Subhraveti, Dinesh Kumar; Santos, Jose Renato G.
- US Patent Document 7,437,606
Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance
patent, September 2003
- Stiffler, Jack J.; Burn, Donald D.
- US Patent Document 6,622,263
Selective preservation of network state during a checkpoint
patent-application, October 2008
- Ganesh, Perinkulam I.; Jain, Vinit; Venkatsubra, Venkat
- US Patent Application 11/741322; 20080267176
Storage access validation to data messages using partial storage address data indexed entries containing permissible address range validation for message source
patent, October 1999
- Fowler, Daniel L.; Baker, William E.; Bunton, William P.
- US Patent Document 5,964,835
Apparatus For Enhancing Performance Of A Parallel Processing Environment, And Associated Methods
patent-application, July 2010
- Howard, Kevin D.
- US Patent Application 12/750338; 20100185719
Methods, media and systems for managing a distributed application running in a plurality of digital processing devices
patent-application, October 2007
- Laadan, Oren; Nieh, Jason; Phung, Dan
- US Patent Application 11/584313; 20070244962