skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient binning for bitmap indices on high-cardinality attributes

Technical Report ·
DOI:https://doi.org/10.2172/841113· OSTI ID:841113

Bitmap indexing is a common technique for indexing high-dimensional data in data warehouses and scientific applications. Though efficient for low-cardinality attributes, query processing can be rather costly for high-cardinality attributes due to the large storage requirements for the bitmap indices. Binning is a common technique for reducing storage costs of bitmap indices. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rather than distinct values. Although binning may reduce storage costs, it may increase the access costs of queries that do not fall on exact bin boundaries (edge bins). For this kind of queries the original data values associated with edge bins must be accessed, in order to check them against the query constraints.In this paper we study the problem of finding optimal locations for the bin boundaries in order to minimize these access costs subject to storage constraints. We propose a dynamic programming algorithm for optimal partitioning of attribute values into bins that takes into account query access patterns as well as data distribution statistics. Mathematical analysis and experiments on real life data sets show that the optimal partitioning achieved by this algorithm can lead to a significant improvement in the access costs of bitmap indexing systems for high-cardinality attributes.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director. Office of Science. Office of Advanced Scientific Computing Research (US)
DOE Contract Number:
AC03-76SF00098
OSTI ID:
841113
Report Number(s):
LBNL-56936; R&D Project: 429201; TRN: US200513%%337
Resource Relation:
Other Information: PBD: 17 Nov 2004
Country of Publication:
United States
Language:
English