DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Method and apparatus for biological sequence comparison

Abstract

A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greatermore » than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

Inventors:
 [1];  [1]
  1. Huntington, NY
Issue Date:
Research Org.:
Cold Spring Harbor Lab., NY (United States)
OSTI Identifier:
871286
Patent Number(s):
5701256
Assignee:
Cold Spring Harbor Laboratory (Cold Spring Harbor, NY)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
G - PHYSICS G16 - INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS G16B - BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
DOE Contract Number:  
FG02-91ER61190
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
method; apparatus; biological; sequence; comparison; comparing; sequences; source; subject; query; takes; input; set; target; similarity; levels; evolutionary; distances; units; pam; fragments; similar; level; statistically; significant; device; filters; average; required; compared; remaining; matches; filtering; divides; overlapping; blocks; block; sufficiently; contain; minimum-length; alignment; filter; compares; fragment; determines; match; determined; provide; upper; threshold; values; regions; length; mean; value; unit; score; concatenated; form; union; current; provides; indication; local; determined set; /702/382/

Citation Formats

Marr, Thomas G, and Chang, William I-Wei. Method and apparatus for biological sequence comparison. United States: N. p., 1997. Web.
Marr, Thomas G, & Chang, William I-Wei. Method and apparatus for biological sequence comparison. United States.
Marr, Thomas G, and Chang, William I-Wei. Wed . "Method and apparatus for biological sequence comparison". United States. https://www.osti.gov/servlets/purl/871286.
@article{osti_871286,
title = {Method and apparatus for biological sequence comparison},
author = {Marr, Thomas G and Chang, William I-Wei},
abstractNote = {A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jan 01 00:00:00 EST 1997},
month = {Wed Jan 01 00:00:00 EST 1997}
}

Works referenced in this record:

The theory and computation of evolutionary distances: Pattern recognition
journal, December 1980


A general method applicable to the search for similarities in the amino acid sequence of two proteins
journal, March 1970


Theoretical and empirical comparisons of approximate string matching algorithms
book, January 1992


Fast text searching: allowing errors
journal, October 1992


A time-efficient, linear-space local similarity algorithm
journal, September 1991


A subquadratic algorithm for approximate limited expression matching
journal, January 1996


Fast string matching with k differences
journal, August 1988


An improved algorithm for matching biological sequences
journal, December 1982


Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries
journal, January 1982


Algorithmic Advances for Searching Biosequence Databases
book, January 1994


Pattern recognition in genetic sequences by mismatch density
journal, July 1984


Finding approximate patterns in strings
journal, March 1985


Protein sequence comparison: methods and significance
journal, January 1991


Approximate string matching in sublinear expected time
conference, January 1990


Identification of common molecular subsequences
journal, March 1981


A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons
journal, October 1987


[20] Mutation data matrix and its uses
book, January 1990


Sublinear approximate string matching and biological applications
journal, November 1994


A sublinear algorithm for approximate keyword searching
journal, November 1994


[21] Sensitivity comparison of protein amino acid sequences
book, January 1990