skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: File caching in data intensive scientific applications

Conference ·
OSTI ID:882745

We present some theoretical and experimental results of animportant caching problem that arises frequently in data intensivescientific applications. In such applications, jobs need to processseveral files simultaneously, i.e., a job can only be serviced if all itsneeded files are present in the disk cache. The set of files requested bya job is called a file-bundle. This requirement introduces the need forcache replacement algorithms based on file-bundles rather then individualfiles. We show that traditional caching algorithms such Least RecentlyUsed (LRU), and GreedyDual-Size (GDS), are not optimal in this case sincethey are not sensitive to file-bundles and may hold in the cachenon-relevant combinations of files. In this paper we propose and analyzea new cache replacement algorithm specifically adapted to deal withfile-bundles. We tested the new algorithm using a disk cache simulationmodel under a wide range of parameters such as file requestdistributions, relative cache size, file size distribution,and queuesize. In all these tests, the results show significant improvement overtraditional caching algorithms such as GDS.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director. Office of Science. Advanced ScientificComputing Research
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
882745
Report Number(s):
LBNL-55587; R&D Project: 429201; BnR: KJ0101030
Resource Relation:
Conference: 1st International Workshop on Data Management inGrids, Trondheim, Norway, September 2 - 3, 2005
Country of Publication:
United States
Language:
English