Summary: Extracting Relations from XML Documents
, C. T. Howard Ho2
, Vanja Josifovski2
, and Joerg
Columbia University, New York, NY, USA
IBM Almaden, San Jose, CA, USA
Abstract. XML is becoming a prevalent format for data exchange.
Many XML documents have complex schemas that are not always known,
and can vary widely between information sources and applications. In
contrast, database applications rely mainly on the flat relational model.
We propose a novel, partially supervised approach for extracting user-
defined relations from XML documents with unknown schema. The ex-
tracted relations can be directly used by an RDBMS, or utilized for in-
formation integration or data mining tasks. Our method attempts to au-