Discovering sequence similarity by the algorithmic significance method

Milosavljevic, A

Title: Discovering sequence similarity by the algorithmic significance method

Conference · Mon Feb 01 00:00:00 EST 1993

OSTI ID:10165238

Milosavljevic, A

The minimal-length encoding approach is applied to define concept of sequence similarity. A sequence is defined to be similar to another sequence or to a set of keywords if it can be encoded in a small number of bits by taking advantage of common subwords. Minimal-length encoding of a sequence is computed in linear time, using a data compression algorithm that is based on a dynamic programming strategy and the directed acyclic word graph data structure. No assumptions about common word (``k-tuple``) length are made in advance, and common words of any length are considered. The newly proposed algorithmic significance method provides an exact upper bound on the probability that sequence similarity has occurred by chance, thus eliminating the need for any arbitrary choice of similarity thresholds. Preliminary experiments indicate that a small number of keywords can positively identify a DNA sequence, which is extremely relevant in the context of partial sequencing by hybridization.

View Conference

Cite

Export

Save

Research Organization:: Argonne National Lab., IL (United States)

Sponsoring Organization:: USDOE, Washington, DC (United States)

DOE Contract Number:: W-31109-ENG-38; FG03-91ER61152

OSTI ID:: 10165238

Report Number(s):: ANL/BIM/CP-78918; CONF-930745-2; ON: DE93015556

Resource Relation:: Conference: 1. international conference in intelligent systems for molecular biology,Washington, DC (United States),7-9 Jul 1993; Other Information: PBD: Feb 1993

Country of Publication:: United States

Language:: English

Similar Records

Discovering sequence similarity by the algorithmic significance method

Conference · Mon Feb 01 00:00:00 EST 1993 · OSTI ID:10165238

Milosavljevic, A

Discovering dependencies via algorithmic mutual information : a case study in DNA sequence comparisons.

Journal Article · Sun Oct 01 00:00:00 EDT 1995 · Mach. Learning · OSTI ID:10165238

Milosavljevic, A

A distance-based block searching algorithm

Technical Report · Sun Dec 31 00:00:00 EST 1995 · OSTI ID:10165238

Sagot, M F; Viari, A; Soldano, H

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
DNA SEQUENCING
ALGORITHMS
PATTERN RECOGNITION
OLIGONUCLEOTIDES
LENGTH
550200
550400
990200
BIOCHEMISTRY
GENETICS
MATHEMATICS AND COMPUTERS

Title: Discovering sequence similarity by the algorithmic significance method

Citation Formats

Similar Records

Related Subjects