Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Towards High Speed Grammar Induction on Large Text Corpora
 

Summary: Towards High Speed Grammar Induction on
Large Text Corpora
Pieter Adriaans 12 , Marten Trautwein 1 , and Marco Vervoort 2
1 Perot Systems Nederland BV, P.O.Box 2729, NL­3800 GG Amersfoort, The
Netherlands
fPieter.Adriaans,Marten.Trautweing@ps.net
2 University of Amsterdam, FdNWI, Plantage Muidergracht 24, NL­1018 TV
Amsterdam, The Netherlands
vervoort@wins.uva.nl
Abstract. In this paper we describe an efficient and scalable implemen­
tation for grammar induction based on the EMILE approach ([2], [3],[4],
[5], [6]). The current EMILE 4.1 implementation ([11]) is one of the first
efficient grammar induction algorithms that work on free text. Although
EMILE 4.1 is far from perfect, it enables researchers to do empirical
grammar induction research on various types of corpora.
The EMILE approach is based on notions from categorial grammar (cf.
[10]), which is known to generate the class of context­free languages.
EMILE learns from positive examples only (cf. [1], [7], [9]). We describe
the algorithms underlying the approach and some interesting practical
results on small and large text collections. As shown in the articles men­

  

Source: Adriaans, Pieter - Instituut voor Informatica, Universiteit van Amsterdam

 

Collections: Computer Technologies and Information Sciences