Checkpointing for a hybrid computing node
Abstract
According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1241311
- Patent Number(s):
- 9280383
- Application Number:
- 14/302,921
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B599858
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2014 Jun 12
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Cher, Chen-Yong. Checkpointing for a hybrid computing node. United States: N. p., 2016.
Web.
Cher, Chen-Yong. Checkpointing for a hybrid computing node. United States.
Cher, Chen-Yong. Tue .
"Checkpointing for a hybrid computing node". United States. https://www.osti.gov/servlets/purl/1241311.
@article{osti_1241311,
title = {Checkpointing for a hybrid computing node},
author = {Cher, Chen-Yong},
abstractNote = {According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {3}
}
Save to My Library
You must Sign In or Create an Account in order to save documents to your library.
Works referenced in this record:
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
journal, July 2011
- Dong, Xiangyu; Xie, Yuan; Muralimanohar, Naveen
- ACM Transactions on Architecture and Code Optimization, Vol. 8, Issue 2
Checkpointing in hybrid distributed systems
conference, January 2004
- Jiannong Cao,
- 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings.
Adaptive incremental checkpointing for massively parallel systems
conference, January 2004
- Agarwal, Saurabh; Garg, Rahul; Gupta, Meeta S.
- Proceedings of the 18th annual international conference on Supercomputing - ICS '04
Checkpointing strategies for parallel jobs
conference, January 2011
- Bougeret, Marin; Casanova, Henri; Rabie, Mikael
- Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010
- Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
conference, January 2010
- Jones, William M.; Daly, John T.; DeBardeleben, Nathan
- Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10
Low-overhead diskless checkpoint for hybrid computing systems
conference, December 2010
- Gomez, Leonardo Bautista; Nukada, Akira; Maruyama, Naoya
- 2010 International Conference on High Performance Computing (HiPC)
MCREngine: A scalable checkpointing system using data-aware aggregation and compression
conference, November 2012
- Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Trace profiling: Scalable event tracing on high-end parallel systems
journal, April 2012
- Mohror, Kathryn; Karavanic, Karen L.
- Parallel Computing, Vol. 38, Issue 4-5, p. 194-225
Apparatus, system, and method for caching data
patent, July 2013
- Flynn, David; Atkisson, David; Aune, Joshua
- US Patent Document 8,489,817