Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Optimizing tertiary storage organization and access for spatio-temporal datasets

Technical Report ·
DOI:https://doi.org/10.2172/109681· OSTI ID:109681
; ;  [1]; ; ;  [2]
  1. Lawrence Berkeley Lab., CA (United States)
  2. Lawrence Livermore National Lab., CA (United States)

We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and. interactive visualization. We have developed data partitioning techniques that partition datasets into ``clusters`` based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.

Research Organization:
Lawrence Livermore National Lab., CA (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States)
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
109681
Report Number(s):
UCRL-JC--119758; CONF-9503106--3; ON: DE96000334
Country of Publication:
United States
Language:
English

Similar Records

The ATree: A data structure to support very large scientific databases
Technical Report · Tue Feb 28 23:00:00 EST 1995 · OSTI ID:638241

Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets
Journal Article · Tue Aug 01 00:00:00 EDT 2006 · ACM Transactions on Storage · OSTI ID:930837

Data management for high energy physics experiments: Preliminary proposals
Conference · Wed Dec 31 23:00:00 EST 1986 · OSTI ID:6641106