Checkpoint triggering in a computer system
According to an aspect, a method for triggering creation of a checkpoint in a computer system includes executing a task in a processing node of the computer system. A monitoring block size is determined for the checkpoint. A checkpoint interval is determined based on the monitoring block size, a checkpoint bandwidth, and a failure rate of the computer system. Based on determining that the checkpoint interval has elapsed, the checkpoint including state data of the task is created to enable restarting execution of the task upon a restart operation. The state data of the checkpoint is restored from a memory responsive to detecting an error condition at the processing node. Execution of the task is restarted in the processing node based on the state data restored from the memory.
- Research Organization:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- B599858
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Number(s):
- 10,585,753
- Application Number:
- 16/033,274
- OSTI ID:
- 1637876
- Resource Relation:
- Patent File Date: 07/12/2018
- Country of Publication:
- United States
- Language:
- English
Checkpointing for a hybrid computing node
|
patent | March 2016 |
Checkpoint triggering in a computer system
|
patent | September 2016 |
Checkpoint Triggering in a Computer System
|
patent-application | October 2016 |
Similar Records
Checkpoint triggering in a computer system
Checkpointing for a hybrid computing node