Method and apparatus for biological sequence comparison
Abstract
A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greatermore »
- Inventors:
-
- Huntington, NY
- Issue Date:
- Research Org.:
- Cold Spring Harbor Lab., NY (United States)
- OSTI Identifier:
- 871286
- Patent Number(s):
- 5701256
- Assignee:
- Cold Spring Harbor Laboratory (Cold Spring Harbor, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
G - PHYSICS G16 - INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS G16B - BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- DOE Contract Number:
- FG02-91ER61190
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- method; apparatus; biological; sequence; comparison; comparing; sequences; source; subject; query; takes; input; set; target; similarity; levels; evolutionary; distances; units; pam; fragments; similar; level; statistically; significant; device; filters; average; required; compared; remaining; matches; filtering; divides; overlapping; blocks; block; sufficiently; contain; minimum-length; alignment; filter; compares; fragment; determines; match; determined; provide; upper; threshold; values; regions; length; mean; value; unit; score; concatenated; form; union; current; provides; indication; local; determined set; /702/382/
Citation Formats
Marr, Thomas G, and Chang, William I-Wei. Method and apparatus for biological sequence comparison. United States: N. p., 1997.
Web.
Marr, Thomas G, & Chang, William I-Wei. Method and apparatus for biological sequence comparison. United States.
Marr, Thomas G, and Chang, William I-Wei. Wed .
"Method and apparatus for biological sequence comparison". United States. https://www.osti.gov/servlets/purl/871286.
@article{osti_871286,
title = {Method and apparatus for biological sequence comparison},
author = {Marr, Thomas G and Chang, William I-Wei},
abstractNote = {A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {1997},
month = {1}
}
Works referenced in this record:
The theory and computation of evolutionary distances: Pattern recognition
journal, December 1980
- Sellers, Peter H.
- Journal of Algorithms, Vol. 1, Issue 4
A general method applicable to the search for similarities in the amino acid sequence of two proteins
journal, March 1970
- Needleman, Saul B.; Wunsch, Christian D.
- Journal of Molecular Biology, Vol. 48, Issue 3, p. 443-453
Theoretical and empirical comparisons of approximate string matching algorithms
book, January 1992
- Chang, William I.; Lampe, Jordan
- Combinatorial Pattern Matching
Fast text searching: allowing errors
journal, October 1992
- Wu, Sun; Manber, Udi
- Communications of the ACM, Vol. 35, Issue 10
A time-efficient, linear-space local similarity algorithm
journal, September 1991
- Huang, Xiaoqiu; Miller, Webb
- Advances in Applied Mathematics, Vol. 12, Issue 3
A subquadratic algorithm for approximate limited expression matching
journal, January 1996
- Wu, Sun; Manber, U.; Myers, G.
- Algorithmica, Vol. 15, Issue 1
Fast string matching with k differences
journal, August 1988
- Landau, Gad M.; Vishkin, Uzi
- Journal of Computer and System Sciences, Vol. 37, Issue 1
An improved algorithm for matching biological sequences
journal, December 1982
- Gotoh, Osamu
- Journal of Molecular Biology, Vol. 162, Issue 3
A contig assembly program based on sensitive detection of fragment overlaps
journal, September 1992
- Huang, Xiaoqiu
- Genomics, Vol. 14, Issue 1
Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries
journal, January 1982
- Goad, Walter B.; Kanehisa, Minoru I.
- Nucleic Acids Research, Vol. 10, Issue 1
Algorithmic Advances for Searching Biosequence Databases
book, January 1994
- Myers, Eugene W.
- Computational Methods in Genome Research
Pattern recognition in genetic sequences by mismatch density
journal, July 1984
- Sellers, Peter H.
- Bulletin of Mathematical Biology, Vol. 46, Issue 4
Finding approximate patterns in strings
journal, March 1985
- Ukkonen, Esko
- Journal of Algorithms, Vol. 6, Issue 1
Protein sequence comparison: methods and significance
journal, January 1991
- Argos, Patrick; Vingron, Martin; Vogt, Gerhard
- "Protein Engineering, Design and Selection", Vol. 4, Issue 4
Approximate string matching in sublinear expected time
conference, January 1990
- Chang, W. I.; Lawler, E. L.
- Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science
Identification of common molecular subsequences
journal, March 1981
- Smith, T. F.; Waterman, M. S.
- Journal of Molecular Biology, Vol. 147, Issue 1, p. 195-197
A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons
journal, October 1987
- Waterman, Michael S.; Eggert, Mark
- Journal of Molecular Biology, Vol. 197, Issue 4
Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms
journal, November 1991
- Pearson, William R.
- Genomics, Vol. 11, Issue 3
[20] Mutation data matrix and its uses
book, January 1990
- George, David G.; Barker, Winona C.; Hunt, Lois T.
- Methods in Enzymology
Sublinear approximate string matching and biological applications
journal, November 1994
- Chang, W. I.; Lawler, E. L.
- Algorithmica, Vol. 12, Issue 4-5
A sublinear algorithm for approximate keyword searching
journal, November 1994
- Myers, E. W.
- Algorithmica, Vol. 12, Issue 4-5
[21] Sensitivity comparison of protein amino acid sequences
book, January 1990
- Argos, Patrick; Vingron, Martin
- Methods in Enzymology