Accelerating shared file checkpoint with local burst buffers
Abstract
A data management system and method for accelerating shared file checkpointing. Written application data is aggregated in an application data file created in a local burst buffer memory at a compute node, and an associated data mapping built index to maintain information related to the offsets into a shared file at which segments of the application data is to be stored in a parallel file system, and where in the buffer those segments are located. The node asynchronously transfers a data file containing the application data and the associated data mapping index to a file server for shared file storage. The data management system and method further accelerates shared file checkpointing in which a shared file, together with a map file that specifies how the shared file is to be distributed, is asynchronously transferred to local burst buffer memories at the nodes to accelerate reading of the shared file.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1892825
- Patent Number(s):
- 11301165
- Application Number:
- 15/963,700
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B604142
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 04/26/2018
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Gooding, Thomas, Lemarinier, Pierre, and Rosenburg, Bryan S. Accelerating shared file checkpoint with local burst buffers. United States: N. p., 2022.
Web.
Gooding, Thomas, Lemarinier, Pierre, & Rosenburg, Bryan S. Accelerating shared file checkpoint with local burst buffers. United States.
Gooding, Thomas, Lemarinier, Pierre, and Rosenburg, Bryan S. Tue .
"Accelerating shared file checkpoint with local burst buffers". United States. https://www.osti.gov/servlets/purl/1892825.
@article{osti_1892825,
title = {Accelerating shared file checkpoint with local burst buffers},
author = {Gooding, Thomas and Lemarinier, Pierre and Rosenburg, Bryan S.},
abstractNote = {A data management system and method for accelerating shared file checkpointing. Written application data is aggregated in an application data file created in a local burst buffer memory at a compute node, and an associated data mapping built index to maintain information related to the offsets into a shared file at which segments of the application data is to be stored in a parallel file system, and where in the buffer those segments are located. The node asynchronously transfers a data file containing the application data and the associated data mapping index to a file server for shared file storage. The data management system and method further accelerates shared file checkpointing in which a shared file, together with a map file that specifies how the shared file is to be distributed, is asynchronously transferred to local burst buffer memories at the nodes to accelerate reading of the shared file.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2022},
month = {4}
}
Works referenced in this record:
PLFS: a checkpoint filesystem for parallel applications
conference, January 2009
- Bent, John; Gibson, Garth; Grider, Gary
A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers
conference, May 2014
- Sato, Kento; Mohror, Kathryn; Moody, Adam
- 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
Parallel compression of data chunks of a shared data object using a log-structured file system
patent, October 2016
- Bent, John M.; Faibish, Sorin; Grider, Gary
- US Patent Document 9,477,682
Architecture and method for a burst buffer using flash technology
patent, March 2016
- Tzelnic, Percy; Faibish, Sorin; Gupta, Uday
- US Patent Document 9,286,261
Method and System For Data Transfer Between Compute Clusters And File System
patent-application, November 2014
- Uppu, Pavan Kumar; Cope, Jason Micah; Nowoczynski, Paul
- US Patent Application14/045170; 20140351300
Method and system for data migration between high performance computing architectures and file system using distributed parity group information structures with non-deterministic data addressing
patent, October 2016
- Piszczek, Michael J.; Cope, Jason M.; Nowoczynski, Paul
- US Patent Document 9,477,551
Minimizing Micro-Interruptions in High-Performance Computing
patent-application, November 2014
- Nowoczynski, Paul; Vildibill, Michael; Cope, Jason
- US Patent Application 14/274,391; 2014/0337557 Al
Burst buffer appliance with small file aggregation
patent, March 2015
- Faibish, Sorin; Bent, John M.
- US Patent Document 8,972,465
Integrated in-system storage architecture for high performance computing
conference, June 2012
- Kimpe, Dries; Mohror, Kathryn; Moody, Adam
- Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Centralized Parallel Burst Engine for High Performance Computing
patent-application, May 2015
- Weber, Bret S.
- US Patent Application 14/078,854; 2015/0134780 Al
How Much SSD Is Useful for Resilience in Supercomputers
conference, January 2015
- Fang, Aiman; Chien, Andrew A.
- Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale - FTXS '15
Cluster file system comprising multiple burst buffers each including virtual file system and metadata server components
patent, August 2018
- Faibish, Sorin; Bent, John M.; Zhang, Jingwang
- US Patent Document 10,049,122
Metadata compression
patent, February 2020
- Bent, John M.; Faibish, Sorin; Zhang, Zhenhua
- US Patent Document 10,558,618
BurstMem: A high-performance burst buffer system for scientific applications
conference, October 2014
- Wang, Teng; Oral, Sarp; Wang, Yandong
- 2014 IEEE International Conference on Big Data (Big Data)