skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Compressing bitmap indices by data reorganization

Conference ·
OSTI ID:841700

Many scientific applications generate massive volumes of data through observations or computer simulations, bringing up the need for effective indexing methods for efficient storage and retrieval of scientific data. Unlike conventional databases, scientific data is mostly read-only and its volume can reach to the order of petabytes, making a compact index structure vital. Bit map indexing has been successfully applied to scientific databases by exploiting the fact that scientific data are enumerated or numerical. Bitmap indices can be compressed with variants of run length encoding for a compact index structure. However even this may not be enough for the enormous data generated in some applications such as high energy physics. In this paper, we study how to reorganize bitmap tables for improved compression rates. Our algorithms are used just as a preprocessing step, thus there is no need to revise the current indexing techniques and the query processing algorithms. We introduce the tuple reordering problem, which aims to reorganize database tuples for optimal compression rates. We propose Gray code ordering algorithm for this NP-Complete problem, which is an in-place algorithm, and runs in linear time in the order of the size of the database. We also discuss how the tuple reordering problem can be reduced to the traveling salesperson problem. Our experimental results on real data sets show that the compression ratio can be improved by a factor of 4 to 7.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director. Office of Science. Office of Advanced Scientific Computing Research. Mathematical Information and Computational Sciences Division (US)
DOE Contract Number:
AC03-76SF00098
OSTI ID:
841700
Report Number(s):
LBNL-55690; ICDE 2005; R&D Project: 365968; TRN: US200515%%324
Resource Relation:
Conference: International Conference on Data Engineering, Tokyo (JP), 04/05/2005--04/08/2005; Other Information: PBD: 1 Jul 2004
Country of Publication:
United States
Language:
English