Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Using MDL for grammar induction Pieter Adriaans y and Ceriel Jacobs z

Summary: Using MDL for grammar induction 
Pieter Adriaans y and Ceriel Jacobs z
In this paper we study the application of the Minimum Description
Length principle (or two-part-code optimization) to grammar induction
in the light of recent developments in Kolmogorov complexity theory. We
focus on issues that are important for construction of e ective compres-
sion algorithms. We de ne an independent measure for the quality of a
theory given a data set: the randomness de ciency. This is a measure
of how typical the data set is for the theory. It can not be computed,
but it can in many relevant cases be approximated. An optimal theory
has minimal randomness de ciency. Using results from Vereshchagin and
Vitanyi [2004] and Adriaans and Vitanyi [2005] we show that:
 Shorter code not necessarily leads to better theories. We prove that,
in DFA induction, already as a result of a single deterministic merge
of two nodes, divergence of randomness de ciency and MDL code
can occur.
 Contrary to what is suggested by the results of Gold [1967] there is
no fundamental di erence between positive and negative data from
an MDL perspective.


Source: Adriaans, Pieter - Instituut voor Informatica, Universiteit van Amsterdam


Collections: Computer Technologies and Information Sciences