Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Mining Reference Tables for Automatic Text Segmentation Eugene Agichtein
 

Summary: Mining Reference Tables for Automatic Text Segmentation
Eugene Agichtein

Columbia University
eugene@cs.columbia.edu
Venkatesh Ganti
Microsoft Research
vganti@microsoft.com
ABSTRACT
Automatically segmenting unstructured text strings into structured
records is necessary for importing the information contained in legacy
sources and text collections into a data warehouse for subsequent
querying, analysis, mining and integration. In this paper, we mine
tables present in data warehouses and relational databases to develop
an automatic segmentation system. Thus, we overcome limitations
of existing supervised text segmentation approaches, which require
comprehensive manually labeled training data. Our segmentation
system is robust, accurate, and efficient, and requires no additional
manual effort. Thorough evaluation on real datasets demonstrates the
robustness and accuracy of our system, with segmentation accuracy

  

Source: Agichtein, Eugene - Department of Mathematics and Computer Science, Emory University

 

Collections: Computer Technologies and Information Sciences