Non-volatile memory for checkpoint storage
Abstract
A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, the non-volatile memory is a pluggable flash memory card.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1149606
- Patent Number(s):
- 8788879
- Application Number:
- 13/004,005
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Blumrich, Matthias A., Chen, Dong, Cipolla, Thomas M., Coteus, Paul W., Gara, Alan, Heidelberger, Philip, Jeanson, Mark J., Kopcsay, Gerard V., Ohmacht, Martin, and Takken, Todd E. Non-volatile memory for checkpoint storage. United States: N. p., 2014.
Web.
Blumrich, Matthias A., Chen, Dong, Cipolla, Thomas M., Coteus, Paul W., Gara, Alan, Heidelberger, Philip, Jeanson, Mark J., Kopcsay, Gerard V., Ohmacht, Martin, & Takken, Todd E. Non-volatile memory for checkpoint storage. United States.
Blumrich, Matthias A., Chen, Dong, Cipolla, Thomas M., Coteus, Paul W., Gara, Alan, Heidelberger, Philip, Jeanson, Mark J., Kopcsay, Gerard V., Ohmacht, Martin, and Takken, Todd E. Tue .
"Non-volatile memory for checkpoint storage". United States. https://www.osti.gov/servlets/purl/1149606.
@article{osti_1149606,
title = {Non-volatile memory for checkpoint storage},
author = {Blumrich, Matthias A. and Chen, Dong and Cipolla, Thomas M. and Coteus, Paul W. and Gara, Alan and Heidelberger, Philip and Jeanson, Mark J. and Kopcsay, Gerard V. and Ohmacht, Martin and Takken, Todd E.},
abstractNote = {A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, the non-volatile memory is a pluggable flash memory card.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {7}
}
Works referenced in this record:
Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance
patent, September 2003
- Stiffler, Jack J.; Burn, Donald D.
- US Patent Document 6,622,263
Method and system for providing reliability and availability in a distributed component object model (DCOM) object oriented system
patent, July 2006
- Wang, Yi-Min
- US Patent Document 7,082,553
Consistent asynchronous checkpointing of multithreaded application programs based on active replication
patent, December 2007
- Moser, Louise E.; Melliar-Smith, Peter M.
- US Patent Document 7,305,582
Method of checkpointing parallel processes in execution within plurality of process domains
patent, October 2008
- Janakiraman, Gopalakrishnan; Subhraveti, Dinesh Kumar; Santos, Jose Renato G.
- US Patent Document 7,437,606
Checkpointing in massively parallel processing
patent, January 2012
- Muralimanohar, Naveen; Jouppi, Norman Paul
- US Patent Document 8,108,718
Novel massively parallel supercomputer
patent-application, May 2004
- Blumrich, Matthias A.; Chen, Dong; Chiu, George L.
- US Patent Application 10/468993; 20040103218
Method of checkpointing parallel processes in execution within plurality of process domains
patent-application, February 2006
- Janakiraman, Gopalakrishnan; Subhraveti, Dinesh Kumar; Santos, Jose Renato
- US Patent Application 10/924513; 20060041786
Methods, media and systems for managing a distributed application running in a plurality of digital processing devices
patent-application, October 2007
- Laadan, Oren; Nieh, Jason; Phung, Dan
- US Patent Application 11/584313; 20070244962
Selective preservation of network state during a checkpoint
patent-application, October 2008
- Ganesh, Perinkulam I.; Jain, Vinit; Venkatsubra, Venkat
- US Patent Application 11/741322; 20080267176
Apparatus For Enhancing Performance Of A Parallel Processing Environment, And Associated Methods
patent-application, July 2010
- Howard, Kevin D.
- US Patent Application 12/750338; 20100185719