Method and apparatus for biological sequence comparison
A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.
- Research Organization:
- Cold Spring Harbor Lab
- Sponsoring Organization:
- USDOE, Washington, DC (United States); National Insts. of Health, Bethesda, MD (United States)
- DOE Contract Number:
- FG02-91ER61190
- Assignee:
- Cold Spring Harbor Lab., NY (United States)
- Patent Number(s):
- US 5,701,256/A/
- Application Number:
- PAN: 8-455,654; CNN: Grant 1R01 HG0020301A1
- OSTI ID:
- 563676
- Resource Relation:
- Other Information: PBD: 23 Dec 1997
- Country of Publication:
- United States
- Language:
- English
Similar Records
DNA sequence recognition by hybridization to short oligomers : experimental determination of accuracy of the method in a genome-scale search.
Exploring the limits of waveform correlation event detection as applied to three earthquake aftershock sequences.