DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Adaptive multi-level checkpointing

Abstract

In some examples, with respect to adaptive multi-level checkpointing, a transfer parameter associated with transfer of checkpoint data from a node-local storage to a parallel file system may be ascertained for the checkpoint data stored in the node-local storage. The transfer parameter may be compared to a specified transfer parameter threshold. A determination may be made, based on the comparison of the transfer parameter to the specified transfer parameter threshold, as to whether to transfer the checkpoint data from the node-local storage to the parallel file system.

Inventors:
; ;
Issue Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1735238
Patent Number(s):
10769017
Application Number:
15/960,302
Assignee:
Hewlett-Packard Development Company, L.P. (Houston, TX)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
DOE Contract Number:  
AC52-07NA27344
Resource Type:
Patent
Resource Relation:
Patent File Date: 04/23/2018
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Xu, Cong, Akgun, Itir, and Faraboschi, Paolo. Adaptive multi-level checkpointing. United States: N. p., 2020. Web.
Xu, Cong, Akgun, Itir, & Faraboschi, Paolo. Adaptive multi-level checkpointing. United States.
Xu, Cong, Akgun, Itir, and Faraboschi, Paolo. Tue . "Adaptive multi-level checkpointing". United States. https://www.osti.gov/servlets/purl/1735238.
@article{osti_1735238,
title = {Adaptive multi-level checkpointing},
author = {Xu, Cong and Akgun, Itir and Faraboschi, Paolo},
abstractNote = {In some examples, with respect to adaptive multi-level checkpointing, a transfer parameter associated with transfer of checkpoint data from a node-local storage to a parallel file system may be ascertained for the checkpoint data stored in the node-local storage. The transfer parameter may be compared to a specified transfer parameter threshold. A determination may be made, based on the comparison of the transfer parameter to the specified transfer parameter threshold, as to whether to transfer the checkpoint data from the node-local storage to the parallel file system.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {9}
}

Patent:

Works referenced in this record:

Checkpointing using compute node health information
patent-application, June 2019


Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems
journal, February 2017


Method and System for Enabling Checkpointing Fault Tolerance Across Remote Virtual Machines
patent-application, November 2011


Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010

  • Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.18

Dynamically Controlled Checkpoint Timing
patent-application, September 2007


A 1 PB/s file system to checkpoint three million MPI tasks
conference, January 2013

  • Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
  • Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13
  • https://doi.org/10.1145/2493123.2462908

Enhancing reliability of a storage system by strategic replica placement and migration
patent, April 2017


Non-volatile memory for checkpoint storage
patent, July 2014


Optimization of a Multilevel Checkpoint Model with Uncertain Execution Scales
conference, November 2014

  • Di, Sheng; Bautista-Gome, Leonardo; Cappello, Franck
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2014.79

Multi-level memory hierarchy
patent-application, October 2015