Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
ACM DL 2000 Snowball: Extracting Relations from Large Plain-Text Collections
 

Summary: ACM DL 2000
Snowball: Extracting Relations from Large Plain-Text Collections
Eugene Agichtein Luis Gravano
Department of Computer Science
Columbia University
1214 Amsterdam Avenue
New York, NY 10027-7003, USA
{eugene,gravano}@cs.columbia.edu
ABSTRACT
Text documents often contain valuable structured data that
is hidden in regular English sentences. This data is best ex-
ploited if available as a relational table that we could use for
answering precise queries or for running data mining tasks.
We explore a technique for extracting such tables from doc-
ument collections that requires only a handful of training ex-
amples from users. These examples are used to generate
extraction patterns, that in turn result in new tuples being
extracted from the document collection. We build on this
idea and present our Snowball system. Snowball introduces
novel strategies for generating patterns and extracting tuples

  

Source: Agichtein, Eugene - Department of Mathematics and Computer Science, Emory University

 

Collections: Computer Technologies and Information Sciences