Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Discovery of Frequent Word Sequences in Text Helena AhonenMyka
 

Summary: Discovery of Frequent Word Sequences in Text
Helena Ahonen­Myka
University of Helsinki
Department of Computer Science
P.O.Box 26 (Teollisuuskatu 23)
FIN--00014 University of Helsinki, Finland,
helena.ahonen­myka@cs.helsinki.fi
Abstract. We have developed a method that extracts all maximal fre­
quent word sequences from the documents of a collection. A sequence is
said to be frequent if it appears in more than oe documents, in which oe
is the frequency threshold given. Furthermore, a sequence is maximal, if
no other frequent sequence exists that contains this sequence. The words
of a sequence do not have to appear in text consecutively.
In this paper, we describe briefly the method for finding all maximal fre­
quent word sequences in text and then extend the method for extracting
generalized sequences from annotated texts, where each word has a set
of additional, e.g. morphological, features attached to it. We aim at dis­
covering patterns which preserve as many features as possible such that
the frequency of the pattern still exceeds the frequency threshold given.
1 Introduction

  

Source: Ahonen, Helena - Department of Computer Science, University of Helsinki

 

Collections: Computer Technologies and Information Sciences