Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Towards Building XML Statistics for the Hidden Web Ashraf Aboulnaga Jeffrey F. Naughton

Summary: Towards Building XML Statistics for the Hidden Web
Ashraf Aboulnaga Jeffrey F. Naughton
IBM Almaden Research Center University of Wisconsin - Madison
aashraf@almaden.ibm.com naughton@cs.wisc.edu
There is currently a lot of interest in developing Internet query processors that can pose elaborate queries
on XML data on the Web. Such query processors can query data sources that have static XML files, but
they should also be able to query "hidden Web" data sources that export an XML view of data stored in a
database. To optimize queries that involve these hidden Web data sources, we need to have XML statistics
that can be used to estimate the selectivity of queries posed to these sources. Since we can only access the
data at a hidden Web data source by issuing queries, we need to develop on-line XML statistics that are built
by observing queries to a hidden Web data source and their result sizes.
In this paper, we assume that queries to a hidden Web data source are XPath selections from a virtual
XML document that represents all the data at this source. We observe the user XPath queries to the data
source and convert them to a more abstract and generalized form that we call annotated path expressions.
We describe an on-line statistics structure that stores such annotated path expressions and information about
their selectivity for use in estimating the selectivity of future XPath queries. We experimentally demonstrate
the convergence and accuracy of our proposed on-line statistics using real and synthetic XML data sets.
March 11, 2003


Source: Aboulnaga, Ashraf - School of Computer Science, University of Waterloo


Collections: Computer Technologies and Information Sciences