Storage of sparse files using parallel log-structured file system
Abstract
A sparse file is stored without holes by storing a data portion of the sparse file using a parallel log-structured file system; and generating an index entry for the data portion, the index entry comprising a logical offset, physical offset and length of the data portion. The holes can be restored to the sparse file upon a reading of the sparse file. The data portion can be stored at a logical end of the sparse file. Additional storage efficiency can optionally be achieved by (i) detecting a write pattern for a plurality of the data portions and generating a single patterned index entry for the plurality of the patterned data portions; and/or (ii) storing the patterned index entries for a plurality of the sparse files in a single directory, wherein each entry in the single directory comprises an identifier of a corresponding sparse file.
- Inventors:
- Issue Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1407704
- Patent Number(s):
- 9811545
- Application Number:
- 13/921,719
- Assignee:
- EMC IP Holding Company LLC
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC52-06NA25396
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2013 Jun 19
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Bent, John M., Faibish, Sorin, Grider, Gary, and Torres, Aaron. Storage of sparse files using parallel log-structured file system. United States: N. p., 2017.
Web.
Bent, John M., Faibish, Sorin, Grider, Gary, & Torres, Aaron. Storage of sparse files using parallel log-structured file system. United States.
Bent, John M., Faibish, Sorin, Grider, Gary, and Torres, Aaron. Tue .
"Storage of sparse files using parallel log-structured file system". United States. https://www.osti.gov/servlets/purl/1407704.
@article{osti_1407704,
title = {Storage of sparse files using parallel log-structured file system},
author = {Bent, John M. and Faibish, Sorin and Grider, Gary and Torres, Aaron},
abstractNote = {A sparse file is stored without holes by storing a data portion of the sparse file using a parallel log-structured file system; and generating an index entry for the data portion, the index entry comprising a logical offset, physical offset and length of the data portion. The holes can be restored to the sparse file upon a reading of the sparse file. The data portion can be stored at a logical end of the sparse file. Additional storage efficiency can optionally be achieved by (i) detecting a write pattern for a plurality of the data portions and generating a single patterned index entry for the plurality of the patterned data portions; and/or (ii) storing the patterned index entries for a plurality of the sparse files in a single directory, wherein each entry in the single directory comprises an identifier of a corresponding sparse file.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {11}
}
Works referenced in this record:
Storing files in a parallel computing system based on user or application specification
patent, March 2016
- Faibish, Sorin; Bent, John M.; Nick, Jeffrey M.
- US Patent Document 9,298,733
Parallel Log Structured File System Collective Buffering to Achieve a Compact Representation of Scientific and/or Dimensional Data
patent-application, June 2013
- Grider, Gary A.; Poole, Stephen W.
- US Patent Document 13/722946; 20130159364
PLFS: a checkpoint filesystem for parallel applications
conference, January 2009
- Bent, John; Gibson, Garth; Grider, Gary
Storage challenges at Los Alamos National Lab
conference, April 2012
- Bent, John; Grider, Gary; Kettering, Brett
A Plugin for HDF5 Using PLFS for Improved I/O Performance and Semantic Analysis
conference, November 2012
- Mehta, Kshitij; Bent, John; Torres, Aaron
Understanding and Improving Computational Science Storage Access through Continuous Characterization
journal, October 2011
- Carns, Philip; Harms, Kevin; Allcock, William
- ACM Transactions on Storage, Vol. 7, Issue 3, p. 1-26
A static analysis of I/O characteristics of scientific applications in a production workload
conference, January 1993
- Pasquale, B. K.; Polyzos, G. C.
Lessons from characterizing the input/output behavior of parallel scientific applications
journal, June 1998
- Smirni, E.; Reed, D. A.
- Performance Evaluation, Vol. 33, Issue 1, p. 27-44
Pattern-aware file reorganization in MPI-IO
conference, January 2011
- He, Jun; Song, Huaiming; Sun, Xian-He
Learning to classify parallel input/output access patterns
journal, August 2002
- Madhyastha, T. M.; Reed, D. A.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 13, Issue 8, p. 802-813
Markov model prediction of I/O requests for scientific applications
conference, January 2002
- Oly, James; Reed, Daniel A.
Automatic arima time series modeling for adaptive I/O prefetching
journal, April 2004
- Tran, N.; Reed, D. A.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 15, Issue 4, p. 362-377
Discovering Structure in Unstructured I/O
conference, November 2012
- He, Jun; Bent, John; Torres, Aaron