| | |
Summary: Snowball: A Prototype System for Extracting Relations
from Large Text Collections
Eugene Agichtein, Luis Gravano, Jeff Pavel, Viktoriya Sokolova, Aleksandr Voskoboynik
Computer Science Department
Columbia University
{eugene,gravano,jeff,vicky,av69}@cs.columbia.edu
Text documents often hide valuable structured data. For
example, a collection of newspaper articles might contain
information on the location of the headquarters of a number
of organizations. If we need to find the location of the head-
quarters of, say, Microsoft, we could try and use traditional
information-retrieval techniques for finding documents that
contain the answer to our query. Alternatively, we could an-
swer such a query more precisely if we somehow had avail-
able a table listing all the organization-location pairs that
are mentioned in our document collection. One could view
the extraction process as automatically building a material-
ized view over the unstructured text data. In this demo we
present an interactive prototype of our Snowball system for
extracting relations from collections of plain-text documents
|