Checkpoint triggering in a computer system
Abstract
According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system and determining whether it is time to read a monitor associated with a metric of the task. The monitor is read to determine a value of the metric based on determining that it is time to read the monitor. A threshold for triggering creation of the checkpoint is determined based on the value of the metric. Based on determining that the value of the metric has crossed the threshold, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1320886
- Patent Number(s):
- 9436552
- Application Number:
- 14/302,947
- Assignee:
- INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B599858
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2014 Jun 12
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Cher, Chen-Yong. Checkpoint triggering in a computer system. United States: N. p., 2016.
Web.
Cher, Chen-Yong. Checkpoint triggering in a computer system. United States.
Cher, Chen-Yong. Tue .
"Checkpoint triggering in a computer system". United States. https://www.osti.gov/servlets/purl/1320886.
@article{osti_1320886,
title = {Checkpoint triggering in a computer system},
author = {Cher, Chen-Yong},
abstractNote = {According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system and determining whether it is time to read a monitor associated with a metric of the task. The monitor is read to determine a value of the metric based on determining that it is time to read the monitor. A threshold for triggering creation of the checkpoint is determined based on the value of the metric. Based on determining that the value of the metric has crossed the threshold, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Sep 06 00:00:00 EDT 2016},
month = {Tue Sep 06 00:00:00 EDT 2016}
}
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
System and method for providing checkpointing with precompile directives and supporting software to produce checkpoints, independent of environment constraints
patent, December 2000
- Ramkumar, Balkrishna; Strumpen, Volker
- US Patent Document 6,161,219
Computer system, management computer, storage system, and backup management method
patent, August 2008
- Okada, Wataru; Sato, Masahide; Emaru, Hironori
- US Patent Document 7,409,414
Template based parallel checkpointing in a massively parallel computer system
patent, December 2009
- Archer, Charles J.; Inglett, Todd A.
- US Patent Document 7,627,783
Risk indices for enhanced throughput in computing systems
patent, July 2011
- Votta, Jr., Lawrence G.; Whisnant, Keith A.; Gross, Kenny C.
- US Patent Document 7,975,175
Fault tolerant computing systems using checkpoints
patent, August 2014
- Bissett, Thomas D.; Leveille, Paul A.; Lin, Ted
- US Patent Document 8,812,907
Optimum checkpoint frequency
patent, November 2014
- Reiss, Charles; Malewicz, Grzegorz; Austern, Matthew H.
- US Patent Document 8,880,941
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems
conference, January 2005
- Janakiraman, G. J.; Santos, J. R.; Subhraveti, D.
- 2005 International Conference on Dependable Systems and Networks (DSN'05)
DyMeLoR: Dynamic Memory Logger and Restorer Library for Optimistic Simulation Objects with Generic Memory Layout
conference, June 2008
- Toccaceli, Roberto; Quaglia, Francesco
- 2008 ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation ( PADS), 2008 22nd Workshop on Principles of Advanced and Distributed Simulation
ickp: a consistent checkpointer for multicomputers
journal, July 1994
- Plank, J. S.
- IEEE Parallel & Distributed Technology: Systems & Applications, Vol. 2, Issue 2
Optimizing Checkpoint Sizes in the C3 System
conference, January 2005
- Marques, D.; Bronevetsky, G.; Fernandes, R.
- 19th IEEE International Parallel and Distributed Processing Symposium
The performance of consistent checkpointing
conference, January 1992
- Elnozahy, E. N.; Johnson, D. B.; Zwaenepoel, W.
- [1992] 11th Symposium on Reliable Distributed Systems, [1992] Proceedings 11th Symposium on Reliable Distributed Systems