Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
The Use of Web-based Statistics to Validate Information Extraction
 

Summary: The Use of Web-based Statistics
to Validate Information Extraction
Stephen Soderland, Oren Etzioni, Tal Shaked, and Daniel S. Weld
Department of Computer Science and Engineering
University of Washington
Seattle, WA 98195-2350
U.S.A.
{soderlan,etzioni,shaked,weld}@cs.washington.edu
Abstract
The World Wide Web is a powerful and readily avail-
able text corpus that can be used effectively to vali-
date the output of an information extraction system. We
present experiments that explore how pointwise mutual
information (PMI) from search engine hit counts can
be used in an Assessor module that assigns a proba-
bility that an extracted fact or relationship is correct,
thus boosting precision. We find that thresholding on
PMI scores is more effective in creating features for the
Assessor than using probability density models. Boot-
strapping can be effective in finding both positive and

  

Source: Anderson, Richard - Department of Computer Science and Engineering, University of Washington at Seattle
Weld, Daniel S.- Department of Computer Science and Engineering, University of Washington at Seattle

 

Collections: Computer Technologies and Information Sciences