Improving Data Availability for Better Access Performance: A Study on Caching Scientific Data on Distributed Workstations
- ORNL
Client-side data caching serves as an excellent mechanism to store and analyze the rapidly growing amount of scientific data. In our previous work, we built a distributed local cache on unreliable desktop storage contributions. This offers several desirable properties, such as performance impedance matching, improved space utilization, and high parallel I/O bandwidth. Such a low-cost, best-effort cache, however, is faced with the vagaries of storage node availability: these donated machines may be significantly less reliable than dedicated systems and cannot be controlled centrally. In this paper, we address %the tradeoffs between techniques that favor %availability or performance when it comes to cache management. the performance impact of data availability in the distributed scientific data cache setting. We then present a novel approach to storage cache management, {\em remote partial data recovery (RPDR)}. We compare our approach to two standard techniques, namely replication and erasure coding, both extended to the target caching environment. Our evaluation uses a trace-driven simulation parameterized with benchmarking results from our distributed cache prototype. The results with multiple real-world traces indicate that RPDR significantly outperforms both replication and erasure coding in many cases and overall the combination of RPDR and erasure coding yields the best performance.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Laboratory Directed Research and Development (LDRD) Program
- DOE Contract Number:
- DE-AC05-00OR22725
- OSTI ID:
- 1000706
- Journal Information:
- Journal of Grid Computing, Vol. 7, Issue 4; ISSN 1570--7873
- Country of Publication:
- United States
- Language:
- English
Similar Records
Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets
Automated Cache Performance Analysis And Optimization