Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Checkpointing for a hybrid computing node

Patent ·
OSTI ID:1241311
According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.
Research Organization:
International Business Machines Corporation, Armonk, NY (United States)
Sponsoring Organization:
USDOE
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Number(s):
9,280,383
Application Number:
14/302,921
OSTI ID:
1241311
Country of Publication:
United States
Language:
English

References (9)

Hybrid checkpointing using emerging nonvolatile memories for future exascale systems journal July 2011
Low-overhead diskless checkpoint for hybrid computing systems conference December 2010
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.18
conference November 2010
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
  • Jones, William M.; Daly, John T.; DeBardeleben, Nathan
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851509
conference January 2010
Checkpointing strategies for parallel jobs
  • Bougeret, Marin; Casanova, Henri; Rabie, Mikael
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063428
conference January 2011
Trace profiling: Scalable event tracing on high-end parallel systems journal April 2012
Adaptive incremental checkpointing for massively parallel systems conference January 2004
Checkpointing in hybrid distributed systems conference January 2004
MCREngine: A scalable checkpointing system using data-aware aggregation and compression
  • Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.77
conference November 2012

Similar Records

Checkpoint triggering in a computer system
Patent · 2020 · OSTI ID:1637876

Checkpoint triggering in a computer system
Patent · 2016 · OSTI ID:1320886

Checkpoint triggering in a computer system
Patent · 2018 · OSTI ID:1489807

Related Subjects