Adaptive multi-level checkpointing
Abstract
In some examples, with respect to adaptive multi-level checkpointing, a transfer parameter associated with transfer of checkpoint data from a node-local storage to a parallel file system may be ascertained for the checkpoint data stored in the node-local storage. The transfer parameter may be compared to a specified transfer parameter threshold. A determination may be made, based on the comparison of the transfer parameter to the specified transfer parameter threshold, as to whether to transfer the checkpoint data from the node-local storage to the parallel file system.
- Inventors:
- Issue Date:
- Research Org.:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1735238
- Patent Number(s):
- 10769017
- Application Number:
- 15/960,302
- Assignee:
- Hewlett-Packard Development Company, L.P. (Houston, TX)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
H - ELECTRICITY H04 - ELECTRIC COMMUNICATION TECHNIQUE H04L - TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- DOE Contract Number:
- AC52-07NA27344
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 04/23/2018
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Xu, Cong, Akgun, Itir, and Faraboschi, Paolo. Adaptive multi-level checkpointing. United States: N. p., 2020.
Web.
Xu, Cong, Akgun, Itir, & Faraboschi, Paolo. Adaptive multi-level checkpointing. United States.
Xu, Cong, Akgun, Itir, and Faraboschi, Paolo. Tue .
"Adaptive multi-level checkpointing". United States. https://www.osti.gov/servlets/purl/1735238.
@article{osti_1735238,
title = {Adaptive multi-level checkpointing},
author = {Xu, Cong and Akgun, Itir and Faraboschi, Paolo},
abstractNote = {In some examples, with respect to adaptive multi-level checkpointing, a transfer parameter associated with transfer of checkpoint data from a node-local storage to a parallel file system may be ascertained for the checkpoint data stored in the node-local storage. The transfer parameter may be compared to a specified transfer parameter threshold. A determination may be made, based on the comparison of the transfer parameter to the specified transfer parameter threshold, as to whether to transfer the checkpoint data from the node-local storage to the parallel file system.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {9}
}
Works referenced in this record:
Storage of bursty data using multiple storage tiers with heterogeneous device storage
patent, March 2018
- Bent, John M.; Faibish, Sorin; Gupta, Uday
- US Patent Document 9,916,311
Checkpointing using compute node health information
patent-application, June 2019
- Andrade Costa, Carlos Henrique; Park, Yoonho; Cher, Chen-Yong
- US Patent Application 15/853343; 20190196920
Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems
journal, February 2017
- Wan, Lipeng; Cao, Qing; Wang, Feiyi
- Journal of Parallel and Distributed Computing, Vol. 100
Method and System for Enabling Checkpointing Fault Tolerance Across Remote Virtual Machines
patent-application, November 2011
- Agesen, Ole; Mummidi, Raviprasad; Subrahmanyam, Pratap
- US Patent Application 12/781875; 2011/0289345
Storage system with distributed tiered parallel file system comprising software-defined unified memory cluster
patent, December 2018
- Faibish, Sorin; Cote, Dominique; Teymouri, Sassan
- US Patent Document 10,157,003
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
conference, November 2010
- Moody, Adam; Bronevetsky, Greg; Mohror, Kathryn
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Dynamically Controlled Checkpoint Timing
patent-application, September 2007
- Ruscio, Joseph; Jones, Nicholas
- US Patent Application 11/535431; 20070220327
A 1 PB/s file system to checkpoint three million MPI tasks
conference, January 2013
- Rajachandrasekar, Raghunath; Moody, Adam; Mohror, Kathryn
- Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13
Enhancing reliability of a storage system by strategic replica placement and migration
patent, April 2017
- Iliadis, Ilias; Kolodner, Elliot K.; Sotnikov, Dmitry
- US Patent Document 9,635,109
Method and system for dynamically collecting data for checkpoint tuning and reduce recovery time
patent, August 2011
- Saha, Abhijit; Bhowmik, Sudip
- US Patent Document 7,991,744
Non-volatile memory for checkpoint storage
patent, July 2014
- Blumrich, Matthias A.; Chen, Dong; Cipolla, Thomas M.
- US Patent Document 8,788,879
Optimization of a Multilevel Checkpoint Model with Uncertain Execution Scales
conference, November 2014
- Di, Sheng; Bautista-Gome, Leonardo; Cappello, Franck
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
Multi-level memory hierarchy
patent-application, October 2015
- Hsu, Lisa R.; O'Connor, James M.; Sridharan, Vilas K.
- US Patent Application 14/250474; 20150293845
Method, apparatus, and computer program product for design and selection of an I/O subsystem of a supercomputer
patent, May 2017
- Tzelnic, Percy; Faibish, Sorin; Gupta, Uday
- US Patent Document 9,652,568