Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

File caching in data intensive scientific applications

Conference ·
OSTI ID:882745
We present some theoretical and experimental results of animportant caching problem that arises frequently in data intensivescientific applications. In such applications, jobs need to processseveral files simultaneously, i.e., a job can only be serviced if all itsneeded files are present in the disk cache. The set of files requested bya job is called a file-bundle. This requirement introduces the need forcache replacement algorithms based on file-bundles rather then individualfiles. We show that traditional caching algorithms such Least RecentlyUsed (LRU), and GreedyDual-Size (GDS), are not optimal in this case sincethey are not sensitive to file-bundles and may hold in the cachenon-relevant combinations of files. In this paper we propose and analyzea new cache replacement algorithm specifically adapted to deal withfile-bundles. We tested the new algorithm using a disk cache simulationmodel under a wide range of parameters such as file requestdistributions, relative cache size, file size distribution,and queuesize. In all these tests, the results show significant improvement overtraditional caching algorithms such as GDS.
Research Organization:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Organization:
USDOE Director. Office of Science. Advanced ScientificComputing Research
DOE Contract Number:
AC02-05CH11231
OSTI ID:
882745
Report Number(s):
LBNL--55587; BnR: KJ0101030
Country of Publication:
United States
Language:
English

Similar Records

Optimal file-bundle caching algorithms for data-grids
Conference · Sat Apr 24 00:00:00 EDT 2004 · OSTI ID:824286

Efficient algorithms for multi-file caching
Conference · Sun Mar 14 23:00:00 EST 2004 · OSTI ID:824285

Accurate modeling of cache replacement policies in a Data-Grid.
Conference · Wed Jan 22 23:00:00 EST 2003 · OSTI ID:815511