skip to main content
DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Checkpointing for a hybrid computing node

Abstract

According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.

Inventors:
Issue Date:
Research Org.:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1241311
Patent Number(s):
9,280,383
Application Number:
14/302,921
Assignee:
International Business Machines Corporation (Armonk, NY)
DOE Contract Number:  
B599858
Resource Type:
Patent
Resource Relation:
Patent File Date: 2014 Jun 12
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Cher, Chen-Yong. Checkpointing for a hybrid computing node. United States: N. p., 2016. Web.
Cher, Chen-Yong. Checkpointing for a hybrid computing node. United States.
Cher, Chen-Yong. Tue . "Checkpointing for a hybrid computing node". United States. https://www.osti.gov/servlets/purl/1241311.
@article{osti_1241311,
title = {Checkpointing for a hybrid computing node},
author = {Cher, Chen-Yong},
abstractNote = {According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2016},
month = {3}
}

Patent:

Save / Share:

Works referenced in this record:

Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
journal, July 2011

  • Dong, Xiangyu; Xie, Yuan; Muralimanohar, Naveen
  • ACM Transactions on Architecture and Code Optimization, Vol. 8, Issue 2
  • DOI: 10.1145/1970386.1970387

Checkpointing in hybrid distributed systems
conference, January 2004

  • Jiannong Cao,
  • 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings.
  • DOI: 10.1109/ISPAN.2004.1300471

Adaptive incremental checkpointing for massively parallel systems
conference, January 2004

  • Agarwal, Saurabh; Garg, Rahul; Gupta, Meeta S.
  • Proceedings of the 18th annual international conference on Supercomputing - ICS '04
  • DOI: 10.1145/1006209.1006248

Checkpointing strategies for parallel jobs
conference, January 2011

  • Bougeret, Marin; Casanova, Henri; Rabie, Mikael
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063428

Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010

  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.18

Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
conference, January 2010

  • Jones, William M.; Daly, John T.; DeBardeleben, Nathan
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10
  • DOI: 10.1145/1851476.1851509

Low-overhead diskless checkpoint for hybrid computing systems
conference, December 2010

  • Gomez, Leonardo Bautista; Nukada, Akira; Maruyama, Naoya
  • 2010 International Conference on High Performance Computing (HiPC)
  • DOI: 10.1109/HIPC.2010.5713163

MCREngine: A scalable checkpointing system using data-aware aggregation and compression
conference, November 2012

  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2012.77

Trace profiling: Scalable event tracing on high-end parallel systems
journal, April 2012