Optimizing tertiary storage organization and access for spatio-temporal datasets
- Lawrence Berkeley Lab., CA (United States)
- Lawrence Livermore National Lab., CA (United States)
We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and. interactive visualization. We have developed data partitioning techniques that partition datasets into ``clusters`` based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.
- Research Organization:
- Lawrence Livermore National Lab., CA (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States)
- DOE Contract Number:
- W-7405-ENG-48
- OSTI ID:
- 109681
- Report Number(s):
- UCRL-JC--119758; CONF-9503106--3; ON: DE96000334
- Country of Publication:
- United States
- Language:
- English
Similar Records
Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets
Data management for high energy physics experiments: Preliminary proposals