An iternative algorithm for correcting sequencing errors in DNA coding regions
Insertion and deletion (indel) sequencing errors in DNA coding regions disrupt DNA-to-protein translation frames, and hence make most frame-sensitive coding recognition approaches fail. This paper extends the authors` previous work on indel detection and `correction` algorithms, and presents a more effective algorithm for localizing indels that appear in DNA coding regions and `correcting` the located indels by inserting or deleting DNA bases. The algorithm localizes indels by discovering changes of the preferred translation frames within presumed coding regions, and then `corrects` the indel errors to restore a consistent translation frame within each coding region. An iterative strategy is exploited to repeatedly localize and `correct` indel errors until no more indels can be found. Test results have shown that the algorithm can accurately locate the positions of indels. The technology presented here has proved to be very useful for single pass EST/cDNA or genomic sequences, and is also often beneficial for higher quality sequences from large genomic clones.
- Research Organization:
- Argonne National Lab., IL (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States)
- DOE Contract Number:
- AC05-84OR21400
- OSTI ID:
- 205860
- Report Number(s):
- CONF-9510318--2; ON: DE96005359
- Country of Publication:
- United States
- Language:
- English
Similar Records
Alignment of DNA and protein sequences containing frameshift errors
Sensitive and error-tolerant annotation of protein-coding DNA with BATH