Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Relational aggregate query processing techniques for real-time databases

Thesis/Dissertation ·
OSTI ID:5897525
In this thesis, a new query processing technique is proposed to cut down systematically the time involved in processing aggregate relational algebra queries in relational databases. The author uses data samples and statistical methods to construct an approximate response to a given aggregate query when there is not enough time to completely process the query. This thesis considers only COUNT(E) type queries, where E is an arbitrary relational algebra expression. He makes no assumptions about the distributions of attribute values and ordering of tuples in the input relations, and propose consistent and unbiased estimators for arbitrary COUNT(E) type queries. A sampling plan based on the cluster sampling method is designed to improve the utilization of sampled data and to reduce the cost of sampling. He iteratively determines the appropriate amount of sample tuples from each relation such that an aggregate query with the set of samples as input can be evaluated within the given amount of time quota with a desired probability. He uses adaptive time cost formulas to estimate the sample evaluation time cost, and propose a run-time approach to estimate the selectivities of relational operators. Finally, a set of time-control strategies are proposed to control the utilization of the time quota such that the time quota can be used efficiently and, at the same time, the risk of overspending the time quota can be kept in control. Performance evaluation of the proposed estimators and the time-control strategy are also reported.
Research Organization:
Case Western Reserve Univ., Cleveland, OH (USA)
OSTI ID:
5897525
Country of Publication:
United States
Language:
English