Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

A fast compact prefix encoding for pattern matching in limited resources devices

Summary: A fast compact prefix encoding for pattern matching in
limited resources devices
S. Harrusi, A. Averbuch, N.Rabin
School of Computer Science
Tel Aviv University, Tel Aviv 69978, Israel
This paper improves the Tagged Suboptimal Codes (TSC) compression scheme in
several ways. We show how to process the TSC as a universal code. We introduce
the TSCk as a family of universal codes where TSC0 is the original TSC. Instead
of constructing an optimal-code such as Huffman, we choose the best near-optimal-
code from the TSCk family of universal prefix-codes. We introduce a fast decoding
technique that uses compact transition tables in order to decode the compressed data
as bytes. We adopt the Aho-Corasick pattern matching algorithm to use the same
compact tables, which are used in the decoding process,in order to perform a fast
pattern matching in the TSCk compressed domain. These improvements make the
TSCk compression scheme fast and compact. The encoding, decoding and search
time of the TSCk compression scheme are similar. These makes the TSCk an ideal
compression scheme for processing of text, which takes place in a steaming mode, in a
machine/device that has a limited memory (several kilobytes). Our experiments show
that both TSC0 and TSC1 fit English text compression. Experiments show that the


Source: Averbuch, Amir - School of Computer Science, Tel Aviv University


Collections: Computer Technologies and Information Sciences