| | |
Summary: PREPRINT. In Proc. Combinatorial Pattern Matching, 1996.
SuOEx Trees on Words
Arne Andersson N. Jesper Larsson Kurt Swanson
Dept. of Computer Science, Lund University,
Box 118, S221 00 LUND, Sweden
--arne,jesper,kurt@dna.lth.se
Abstract. We discuss an intrinsic generalization of the suOEx tree, de
signed to index a string of length n which has a natural partitioning into
m multicharacter substrings or words. This word suOEx tree represents
only the m suOExes that start at word boundaries. These boundaries are
determined by delimiters, whose deønition depends on the application.
Since traditional suOEx tree construction algorithms rely heavily on the
fact that all suOExes are inserted, construction of a word suOEx tree is non
trivial, in particular when only O(m) construction space is allowed. We
solve this problem, presenting an algorithm with O(n) expected running
time. In general, construction cost
is\Omega (n) due to the need of scanning
the entire input. In applications that require strict node ordering, an ad
ditional cost of sorting O(m 0
) characters arises, where m 0 is the number
|