Storing files in a parallel computing system based on user-specified parser function
Abstract
Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.
- Inventors:
- Issue Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1160235
- Patent Number(s):
- 8868576
- Application Number:
- 13/536,369
- Assignee:
- EMC Corporation (Hopkinton, MA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC52-06NA25396
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2012 Jun 28
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Faibish, Sorin, Bent, John M, Tzelnic, Percy, Grider, Gary, Manzanares, Adam, and Torres, Aaron. Storing files in a parallel computing system based on user-specified parser function. United States: N. p., 2014.
Web.
Faibish, Sorin, Bent, John M, Tzelnic, Percy, Grider, Gary, Manzanares, Adam, & Torres, Aaron. Storing files in a parallel computing system based on user-specified parser function. United States.
Faibish, Sorin, Bent, John M, Tzelnic, Percy, Grider, Gary, Manzanares, Adam, and Torres, Aaron. Tue .
"Storing files in a parallel computing system based on user-specified parser function". United States. https://www.osti.gov/servlets/purl/1160235.
@article{osti_1160235,
title = {Storing files in a parallel computing system based on user-specified parser function},
author = {Faibish, Sorin and Bent, John M and Tzelnic, Percy and Grider, Gary and Manzanares, Adam and Torres, Aaron},
abstractNote = {Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Oct 21 00:00:00 EDT 2014},
month = {Tue Oct 21 00:00:00 EDT 2014}
}
Works referenced in this record:
Method and system for a batch parser
patent, May 2010
- Sonkin, Dmitry; Prang, Bruce; Popa, Marius
- US Patent Document 7,712,088
Systems and methods for managing portions of files in multi-tier storage systems
patent, January 2013
- Mamidi, Murthy V.; Malige, Raghupathi; Ravi, Gautham
- US Patent Document 8,352,429
Storage of row-column data
patent-application, February 2003
- Sah, Adam; Karlson, Eric; Taylor, Cimarron
- US Patent Application 09/923498; 20030028509
Multi-Model Access To Data
patent-application, September 2008
- Sedlar, Eric
- US Patent Application 12/122517; 20080215528
Query Execution Systems and Methods
patent-application, December 2011
- Abadi, Daniel; Bajda-Pawlikowski, Kamil
- US Patent Application 13/032551; 20110302151
Data Loading Systems and Methods
patent-application, December 2011
- Abadi, Daniel; Abouzied, Azza
- US Patent Application 13/032538; 20110302226
System and Method for Data Stream Processing
patent-application, March 2012
- Hsu, Meichun; Chen, Qiming
- US Patent Application 12/907948; 20120078951
User Defined Functions for Data Loading
patent-application, September 2012
- George, Muthian; Wang, Song
- US Patent Application 13/485246; 20120239612
PLFS: a checkpoint filesystem for parallel applications
conference, January 2009
- Bent, John; Gibson, Garth; Grider, Gary
Works referencing / citing this record:
Method and system for data transfer between compute clusters and file system
patent, April 2017
- Uppu, Pavan Kumar; Cope, Jason M.; Nowoczynski, Paul
- US Patent Document 9,628,299
Multi-tier caching
patent, May 2016
- Archak, Shrikar; Dixit, Sagar Shyam; Spillane, Richard P.
- US Patent Document 9,355,109
Architecture and method for a burst buffer using flash technology
patent, March 2016
- Tzelnic, Percy; Faibish, Sorin; Gupta, Uday
- US Patent Document 9,286,261