skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On the performance of bitmap indices for high cardinality attributes

Conference ·
OSTI ID:822860

It is well established that bitmap indices are efficient for read-only attributes with a small number of distinct values. For an attribute with a large number of distinct values, the size of the bitmap index can be very large. To over come this size problem, specialized compression schemes are used. Even though there is empirical evidence that some of these compression schemes work well, there has not been any systematic analysis of their effectiveness. In this paper, we analyze the time and space complexities of the two most efficient bitmap compression techniques known, the Byte-aligned Bitmap Code (BBC) and the Word-Aligned Hybrid (WAH) code, and study their performance on high cardinality attributes. Our analyses indicate that both compression schemes are optimal in time. The time and space required to operate on two compressed bitmaps are proportional to the total size of the two bitmaps. We demonstrate further that an in-place OR algorithm can operate on a large number of sparse bitmaps in time linear in their total size. Our analyses also show that the compressed indices are relatively small compared with commonly used indices such as B-trees. Given these facts, we conclude that bitmap index is efficient on attributes of low cardinalities as well as on those of high cardinalities. We also verify the analytical results with extensive tests, and identify an optimal way to combine different options to achieve the best performance. The test results confirm the linearity in the total size of the compressed bitmaps, and that WAH out performs BBC by about a factor of two.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director. Office of Science. Computational and Technology Research (US)
DOE Contract Number:
AC03-76SF00098
OSTI ID:
822860
Report Number(s):
LBNL-54673; R&D Project: KS3310; TRN: US200414%%320
Resource Relation:
Conference: VLDB 2004: 30th International Conference on Very Large Data Bases, Toronto, Ontario (CA), 08/30/2004--09/03/2004; Other Information: PBD: 5 Mar 2004
Country of Publication:
United States
Language:
English