Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Towards High Speed Grammar Induction on Large Text Corpora

Summary: Towards High Speed Grammar Induction on
Large Text Corpora
Pieter Adriaans 12 , Marten Trautwein 1 , and Marco Vervoort 2
1 Perot Systems Nederland BV, P.O.Box 2729, NL­3800 GG Amersfoort, The
2 University of Amsterdam, FdNWI, Plantage Muidergracht 24, NL­1018 TV
Amsterdam, The Netherlands
Abstract. In this paper we describe an efficient and scalable implemen­
tation for grammar induction based on the EMILE approach ([2], [3],[4],
[5], [6]). The current EMILE 4.1 implementation ([11]) is one of the first
efficient grammar induction algorithms that work on free text. Although
EMILE 4.1 is far from perfect, it enables researchers to do empirical
grammar induction research on various types of corpora.
The EMILE approach is based on notions from categorial grammar (cf.
[10]), which is known to generate the class of context­free languages.
EMILE learns from positive examples only (cf. [1], [7], [9]). We describe
the algorithms underlying the approach and some interesting practical
results on small and large text collections. As shown in the articles men­


Source: Adriaans, Pieter - Instituut voor Informatica, Universiteit van Amsterdam


Collections: Computer Technologies and Information Sciences