| | |
Summary: Finding All Maximal Frequent Sequences in Text
Helena AhonenMyka
WilhelmSchickardInstitut f¨ur Informatik,
University of T¨ubingen,
Sand 13,
D72076 T¨ubingen, Germany
helena.ahonen@acm.org
Abstract
In this paper we present a novel algorithm
for discovering maximal frequent sequences
in a set of documents, i.e., such sequences
of words that are frequent in the document
collection and, moreover, that are not con
tained in any other longer frequent sequence.
A sequence is considered to be frequent if it
appears in at least oe documents, when oe is
the frequency threshold given. Our approach
combines bottomup and greedy methods,
and, hence, is able to extract maximal se
quences without considering all the frequent
|