skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Checkpointing for a hybrid computing node

Patent ·
OSTI ID:1241311

According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.

Research Organization:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
B599858
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Number(s):
9,280,383
Application Number:
14/302,921
OSTI ID:
1241311
Resource Relation:
Patent File Date: 2014 Jun 12
Country of Publication:
United States
Language:
English

References (10)

Hybrid checkpointing using emerging nonvolatile memories for future exascale systems journal July 2011
Checkpointing in hybrid distributed systems conference January 2004
Adaptive incremental checkpointing for massively parallel systems conference January 2004
Checkpointing strategies for parallel jobs
  • Bougeret, Marin; Casanova, Henri; Rabie, Mikael
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063428
conference January 2011
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18
conference November 2010
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
  • Jones, William M.; Daly, John T.; DeBardeleben, Nathan
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851509
conference January 2010
Low-overhead diskless checkpoint for hybrid computing systems conference December 2010
MCREngine: A scalable checkpointing system using data-aware aggregation and compression
  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.77
conference November 2012
Trace profiling: Scalable event tracing on high-end parallel systems journal April 2012
Apparatus, system, and method for caching data patent July 2013

Similar Records

Checkpoint triggering in a computer system
Patent · Tue Mar 10 00:00:00 EDT 2020 · OSTI ID:1241311

Checkpoint triggering in a computer system
Patent · Tue Oct 02 00:00:00 EDT 2018 · OSTI ID:1241311

Checkpoint triggering in a computer system
Patent · Tue Sep 06 00:00:00 EDT 2016 · OSTI ID:1241311

Related Subjects