| | |
Summary: Combining Strategies for Extracting Relations from Text Collections
Eugene Agichtein Eleazar Eskin Luis Gravano
Department of Computer Science
Columbia University
{eugene,eeskin,gravano}@cs.columbia.edu
Abstract
Text documents often contain valuable structured data
that is hidden in regular English sentences. This data is
best exploited if available as a relational table that we
could use for answering precise queries or for running
data mining tasks. Our Snowball system extracts these
relations from document collections starting with only
a handful of user-provided example tuples. Based on
these tuples, Snowball generates patterns that are used,
in turn, to find more tuples. In this paper we introduce a
new pattern and tuple generation scheme for Snowball,
with different strengths and weaknesses than those of
our original system. We also show preliminary results
on how we can combine the two versions of Snowball
to extract tuples more accurately.
|