| | |
Summary: Using a Relational Database for Scalable XML Search
Rebecca J. Cathey, Steven M. Beitzel, Eric C. Jensen, David Grossman, Ophir Frieder
Information Retrieval Laboratory
Department of Computer Science
Illinois Institute of Technology
Chicago, IL 60616
{cathey, beitzel, jensen, grossman, frieder}@ ir.iit.edu
June 3, 2007
Abstract
XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments.
Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods
exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational
approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process
queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search
XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To
date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational
schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema
eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these
potentially longer runtimes are still significantly shorter than those of the tree approach.
We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections
|