Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Alignment of DNA and protein sequences containing frameshift errors

Technical Report ·
OSTI ID:71519

Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artifactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six frame translation can miss important homologies because only sub-fragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs significantly better than any previously reported method.

Research Organization:
Oak Ridge National Lab., TN (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States)
DOE Contract Number:
AC05-84OR21400
OSTI ID:
71519
Report Number(s):
ORNL/TM--12976; ON: DE95010432
Country of Publication:
United States
Language:
English

Similar Records

An iternative algorithm for correcting sequencing errors in DNA coding regions
Conference · Sat Dec 30 23:00:00 EST 1995 · OSTI ID:205860

Sensitive and error-tolerant annotation of protein-coding DNA with BATH
Journal Article · Fri Jun 14 00:00:00 EDT 2024 · Bioinformatics Advances · OSTI ID:2510958