DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Checkpoint triggering in a computer system

Abstract

According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system. A monitoring block size is determined for the checkpoint. A checkpoint interval is determined based on the monitoring block size, a checkpoint bandwidth, and a failure rate of the computer system. Based on determining that the checkpoint interval has elapsed, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation. The state data of the checkpoint is restored from a memory responsive to detecting an error condition at the processing node. Execution of the task is restarted in the processing node based on the state data restored from the memory.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1637876
Patent Number(s):
10585753
Application Number:
16/033,274
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
B599858
Resource Type:
Patent
Resource Relation:
Patent File Date: 07/12/2018
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Cher, Chen-Yong. Checkpoint triggering in a computer system. United States: N. p., 2020. Web.
Cher, Chen-Yong. Checkpoint triggering in a computer system. United States.
Cher, Chen-Yong. Tue . "Checkpoint triggering in a computer system". United States. https://www.osti.gov/servlets/purl/1637876.
@article{osti_1637876,
title = {Checkpoint triggering in a computer system},
author = {Cher, Chen-Yong},
abstractNote = {According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system. A monitoring block size is determined for the checkpoint. A checkpoint interval is determined based on the monitoring block size, a checkpoint bandwidth, and a failure rate of the computer system. Based on determining that the checkpoint interval has elapsed, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation. The state data of the checkpoint is restored from a memory responsive to detecting an error condition at the processing node. Execution of the task is restarted in the processing node based on the state data restored from the memory.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Mar 10 00:00:00 EDT 2020},
month = {Tue Mar 10 00:00:00 EDT 2020}
}

Works referenced in this record:

Checkpoint triggering in a computer system
patent, September 2016


Checkpoint Triggering in a Computer System
patent-application, October 2016