skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices

Abstract

Bitmap indices have been widely used in scientific applications and commercial systems for processing complex,multi-dimensional queries where traditional tree-based indices would not work efficiently. A common approach for reducing the size of a bitmap index for high cardinality attributes is to group ranges of values of an attribute into bins and then build a bitmap for each bin rather than a bitmap for each value of the attribute. Binning reduces storage costs,however, results of queries based on bins often require additional filtering for discarding it false positives, i.e., records in the result that do not satisfy the query constraints. This additional filtering,also known as ''candidate checking,'' requires access to the base data on disk and involves significant I/O costs. This paper studies strategies for minimizing the I/O costs for ''candidate checking'' for multi-dimensional queries. This is done by determining the number of bins allocated for each dimension and then placing bin boundaries in optimal locations. Our algorithms use knowledge of data distribution and query workload. We derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 formore » multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.« less

Authors:
; ;
Publication Date:
Research Org.:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Org.:
USDOE Director. Office of Science. Office of AdvancedScientific Computing Research
OSTI Identifier:
898945
Report Number(s):
LBNL-59949
R&D Project: 429201; BnR: KJ0101030; TRN: US200706%%450
DOE Contract Number:
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: International Conference on Scientific andStatistical Database Management (SSDBM 2006), Vienna, Austria, July 3-5,2006
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; DIMENSIONS; DISTRIBUTION; EVALUATION; MANAGEMENT; PROCESSING; STORAGE; Bitmap index query optimization databases

Citation Formats

Rotem, Doron, Stockinger, Kurt, and Wu, Kesheng. Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices. United States: N. p., 2006. Web.
Rotem, Doron, Stockinger, Kurt, & Wu, Kesheng. Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices. United States.
Rotem, Doron, Stockinger, Kurt, and Wu, Kesheng. Thu . "Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices". United States. doi:. https://www.osti.gov/servlets/purl/898945.
@article{osti_898945,
title = {Minimizing I/O Costs of Multi-Dimensional Queries with BitmapIndices},
author = {Rotem, Doron and Stockinger, Kurt and Wu, Kesheng},
abstractNote = {Bitmap indices have been widely used in scientific applications and commercial systems for processing complex,multi-dimensional queries where traditional tree-based indices would not work efficiently. A common approach for reducing the size of a bitmap index for high cardinality attributes is to group ranges of values of an attribute into bins and then build a bitmap for each bin rather than a bitmap for each value of the attribute. Binning reduces storage costs,however, results of queries based on bins often require additional filtering for discarding it false positives, i.e., records in the result that do not satisfy the query constraints. This additional filtering,also known as ''candidate checking,'' requires access to the base data on disk and involves significant I/O costs. This paper studies strategies for minimizing the I/O costs for ''candidate checking'' for multi-dimensional queries. This is done by determining the number of bins allocated for each dimension and then placing bin boundaries in optimal locations. Our algorithms use knowledge of data distribution and query workload. We derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 for multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Mar 30 00:00:00 EST 2006},
month = {Thu Mar 30 00:00:00 EST 2006}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less
  • At International Marine Terminals' Plaquemines Parish Terminal, design optimization was accomplished by optimizing the dock pile bent spacing and designing the superstructure to distribute berthing impact forces and bollard pulls over a large number of pile bents. Also, by resisting all longitudinal forces acting on the dock at a single location near the center of the structure, the number of longitudinal batter piles was minimized and the need for costly expansion joints was eliminated. Computer techniques were utilized to analyze and optimize the design of the new dock. Pile driving procedures were evaluated utilizing a wave equation technique. Tripod dolphinsmore » with a resilient fender system were provided. The resilent fender system, a combination of rubber shear type and wing type fenders, adds only a small percentage to the total cost of the dolphins but greatly increases their energy absorption capability.« less
  • The time and cost to properly identify, characterize, and remediate sites that are contaminated with hazardous materials or wastes can be reduced through effective planning and screening of the sites and their characteristics. Soil gas surveys have become an important screening tool in identifying contamination and delineating its lateral extent. Analysis of near surface soil gas provides information on the relative concentrations of volatile organics in the soil, which in turn indicates horizontal extent of contaminant movement and potential sources. Information provided by soil gas surveys is used to more productively locate soil borings and monitoring wells which can providemore » more definitive information on contaminant identification, concentration, and vertical extent. This books presents the methods and results for a soil gas survey performed at an oil refinery in Southern California. A facility layout for the refinery is shown. The methods used for the survey are typical of a screening survey to identify potential subsurface contaminants and contaminant sources.« less
  • During the past quarter century, much has been learned about tube degradation, the factors which lead to and influence the rate of damage, and measures to mitigate or eliminate the damage in boiler tubing. This paper will describe some of the knowledge which has been compiled regarding two of the most significant degradation modes--corrosion-fatigue of waterwall tubes and high temperature creep of superheater and reheater tubes.