Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Efficient binning for bitmap indices on high-cardinality attributes

Technical Report ·
DOI:https://doi.org/10.2172/841113· OSTI ID:841113

Bitmap indexing is a common technique for indexing high-dimensional data in data warehouses and scientific applications. Though efficient for low-cardinality attributes, query processing can be rather costly for high-cardinality attributes due to the large storage requirements for the bitmap indices. Binning is a common technique for reducing storage costs of bitmap indices. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rather than distinct values. Although binning may reduce storage costs, it may increase the access costs of queries that do not fall on exact bin boundaries (edge bins). For this kind of queries the original data values associated with edge bins must be accessed, in order to check them against the query constraints.In this paper we study the problem of finding optimal locations for the bin boundaries in order to minimize these access costs subject to storage constraints. We propose a dynamic programming algorithm for optimal partitioning of attribute values into bins that takes into account query access patterns as well as data distribution statistics. Mathematical analysis and experiments on real life data sets show that the optimal partitioning achieved by this algorithm can lead to a significant improvement in the access costs of bitmap indexing systems for high-cardinality attributes.

Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
USDOE Director. Office of Science. Office of Advanced Scientific Computing Research (US)
DOE Contract Number:
AC03-76SF00098
OSTI ID:
841113
Report Number(s):
LBNL--56936
Country of Publication:
United States
Language:
English

Similar Records

Evaluation Strategies for Bitmap Indices with Binning
Conference · Thu Jun 03 00:00:00 EDT 2004 · OSTI ID:861196

Breaking the Curse of Cardinality on Bitmap Indexes
Conference · Fri Apr 04 00:00:00 EDT 2008 · OSTI ID:927150

On the performance of bitmap indices for high cardinality attributes
Conference · Thu Mar 04 23:00:00 EST 2004 · OSTI ID:822860