Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Generating grammars for SGML tagged texts lacking DTD \Lambda

Summary: Generating grammars for SGML tagged
texts lacking DTD \Lambda
Helena Ahonen
University of Helsinki
Heikki Mannila
University of Helsinki
Erja Nikunen
Research Centre for Domestic Languages
We describe a technique for forming a context free grammar for a
document that has some kind of tagging --- structural or typograph­
ical --- but no concise description of the structure is available. The
technique is based on ideas from machine learning. It forms first a set
of finite­state automata describing the document completely. These
automata are modified by considering certain context conditions; the
modifications correspond to generalizing the underlying languages. Fi­
nally, the automata are converted into regular expressions, which are
then used to construct the grammar. An alternative representation,
characteristic k­grams, is also introduced. Additionally, the paper de­
scribes some interactive operations necessary for generating a gram­


Source: Ahonen, Helena - Department of Computer Science, University of Helsinki


Collections: Computer Technologies and Information Sciences