Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Unsupervised Named-Entity Extraction from the Web: An Experimental Study
 

Summary: Unsupervised Named-Entity Extraction
from the Web: An Experimental Study
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu
Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates
Department of Computer Science and Engineering
University of Washington
Seattle, WA 98195-2350
etzioni@cs.washington.edu
February 28, 2005
Abstract
The KNOWITALL system aims to automate the tedious process of extracting large col-
lections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised,
domain-independent, and scalable manner. The paper presents an overview of KNOW-
ITALL's novel architecture and design principles, emphasizing its distinctive ability to ex-
tract information without any hand-labeled training examples. In its first major run, KNOW-
ITALL extracted over 50,000 class instances, but suggested a challenge: How can we im-
prove KNOWITALL's recall and extraction rate without sacrificing precision?
This paper presents three distinct ways to address this challenge and evaluates their perfor-
mance. Pattern Learning learns domain-specific extraction rules, which enable additional
extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall

  

Source: Anderson, Richard - Department of Computer Science and Engineering, University of Washington at Seattle
Cafarella, Michael J. - Department of Electrical Engineering and Computer Science, University of Michigan
Weld, Daniel S.- Department of Computer Science and Engineering, University of Washington at Seattle
Yates, Alexander - Department of Computer and Information Sciences, Temple University

 

Collections: Computer Technologies and Information Sciences