skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices

Abstract

Large scale scientific data is often stored in scientific data formats such as FITS, netCDF and HDF. These storage formats are of particular interest to the scientific user community since they provide multi-dimensional storage and retrieval. However, one of the drawbacks of these storage formats is that they do not support semantic indexing which is important for interactive data analysis where scientists look for features of interests such as ''Find all supernova explosions where energy >105 and temperature >106''. In this paper we present a novel approach called HDF5-FastQuery to accelerate the data access of large HDF5 files by introducing multi-dimensional semantic indexing. Our implementation leverages an efficient indexing technology called ''bitmapindexing'' that has been widely used in the database community. Bitmapindices are especially well suited for interactive exploration of large-scale read-only data. Storing the bitmap indices into the HDF5 file has the following advantages: (a) Significant performance speedup of accessing subsets of multi-dimensional data and (b) portability of the indices across multiple computer platforms. We will present an API that simplifies the execution of queries on HDF5 files for general scientific applications and data analysis. The design is flexible enough to accommodate the use of arbitrary indexing technology formore » semantic range queries. We will also provide a detailed performance analysis of HDF5-FastQuery for both synthetic and scientific data. The results demonstrate that our proposed approach for multi-dimensional queries is up to a factor of 2 faster than HDF5.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Org.:
USDOE Director. Office of Science. Office of AdvancedScientific Computing Research
OSTI Identifier:
881619
Report Number(s):
LBNL-59602-Ext.-Abs.
R&D Project: K11107; BnR: KJ0101030; TRN: US200612%%899
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: 2005 HDF Workshop, San Francisco, CA,11/30/2005-12/02/2005
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; COMPUTERS; DATA ANALYSIS; DESIGN; EXPLORATION; EXPLOSIONS; IMPLEMENTATION; PERFORMANCE; STORAGE; semantic range queries HDF5 query-driven visualization

Citation Formats

Gosink, Luke, Shalf, John, Stockinger, Kurt, Wu, Kesheng, and Bethel,Wes. HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices. United States: N. p., 2005. Web.
Gosink, Luke, Shalf, John, Stockinger, Kurt, Wu, Kesheng, & Bethel,Wes. HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices. United States.
Gosink, Luke, Shalf, John, Stockinger, Kurt, Wu, Kesheng, and Bethel,Wes. Wed . "HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices". United States. https://www.osti.gov/servlets/purl/881619.
@article{osti_881619,
title = {HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets UsingFast Bitmap Indices},
author = {Gosink, Luke and Shalf, John and Stockinger, Kurt and Wu, Kesheng and Bethel,Wes},
abstractNote = {Large scale scientific data is often stored in scientific data formats such as FITS, netCDF and HDF. These storage formats are of particular interest to the scientific user community since they provide multi-dimensional storage and retrieval. However, one of the drawbacks of these storage formats is that they do not support semantic indexing which is important for interactive data analysis where scientists look for features of interests such as ''Find all supernova explosions where energy >105 and temperature >106''. In this paper we present a novel approach called HDF5-FastQuery to accelerate the data access of large HDF5 files by introducing multi-dimensional semantic indexing. Our implementation leverages an efficient indexing technology called ''bitmapindexing'' that has been widely used in the database community. Bitmapindices are especially well suited for interactive exploration of large-scale read-only data. Storing the bitmap indices into the HDF5 file has the following advantages: (a) Significant performance speedup of accessing subsets of multi-dimensional data and (b) portability of the indices across multiple computer platforms. We will present an API that simplifies the execution of queries on HDF5 files for general scientific applications and data analysis. The design is flexible enough to accommodate the use of arbitrary indexing technology for semantic range queries. We will also provide a detailed performance analysis of HDF5-FastQuery for both synthetic and scientific data. The results demonstrate that our proposed approach for multi-dimensional queries is up to a factor of 2 faster than HDF5.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2005},
month = {12}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: