DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Checkpoint triggering in a computer system

Abstract

According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system and determining whether it is time to read a monitor associated with a metric of the task. The monitor is read to determine a value of the metric based on determining that it is time to read the monitor. A threshold for triggering creation of the checkpoint is determined based on the value of the metric. Based on determining that the value of the metric has crossed the threshold, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1320886
Patent Number(s):
9436552
Application Number:
14/302,947
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B599858
Resource Type:
Patent
Resource Relation:
Patent File Date: 2014 Jun 12
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Cher, Chen-Yong. Checkpoint triggering in a computer system. United States: N. p., 2016. Web.
Cher, Chen-Yong. Checkpoint triggering in a computer system. United States.
Cher, Chen-Yong. Tue . "Checkpoint triggering in a computer system". United States. https://www.osti.gov/servlets/purl/1320886.
@article{osti_1320886,
title = {Checkpoint triggering in a computer system},
author = {Cher, Chen-Yong},
abstractNote = {According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system and determining whether it is time to read a monitor associated with a metric of the task. The monitor is read to determine a value of the metric based on determining that it is time to read the monitor. A threshold for triggering creation of the checkpoint is determined based on the value of the metric. Based on determining that the value of the metric has crossed the threshold, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {9}
}

Works referenced in this record:

Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems
conference, January 2005


DyMeLoR: Dynamic Memory Logger and Restorer Library for Optimistic Simulation Objects with Generic Memory Layout
conference, June 2008

  • Toccaceli, Roberto; Quaglia, Francesco
  • 2008 ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation ( PADS), 2008 22nd Workshop on Principles of Advanced and Distributed Simulation
  • https://doi.org/10.1109/PADS.2008.23

ickp: a consistent checkpointer for multicomputers
journal, July 1994


Optimizing Checkpoint Sizes in the C3 System
conference, January 2005


The performance of consistent checkpointing
conference, January 1992

  • Elnozahy, E. N.; Johnson, D. B.; Zwaenepoel, W.
  • [1992] 11th Symposium on Reliable Distributed Systems, [1992] Proceedings 11th Symposium on Reliable Distributed Systems
  • https://doi.org/10.1109/RELDIS.1992.235144