| | |
Summary: Extracting Relations from XML Documents
Eugene Agichtein1
, C. T. Howard Ho2
, Vanja Josifovski2
, and Joerg
Gerhardt2
1
Columbia University, New York, NY, USA
eugene@cs.columbia.edu
2
IBM Almaden, San Jose, CA, USA
{ho,vanja}@almaden.ibm.com
Abstract. XML is becoming a prevalent format for data exchange.
Many XML documents have complex schemas that are not always known,
and can vary widely between information sources and applications. In
contrast, database applications rely mainly on the flat relational model.
We propose a novel, partially supervised approach for extracting user-
defined relations from XML documents with unknown schema. The ex-
tracted relations can be directly used by an RDBMS, or utilized for in-
formation integration or data mining tasks. Our method attempts to au-
|