skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DNA sequence confidence estimation

Abstract

A significant bottleneck in the current DNA sequencing process is the manual editing of trace data generated by automated DNA sequencers. This step is used to correct base calls and to associate to each base call a confidence level. The confidence levels are used in the assembly process to determine overlaps and to resolve discrepancies in determining the consensus sequence. This single step may cost as much as 4 to 8 cents per finished base. The authors report an approach to automated trace editing using classification trees to detect and exploit context-based patterns in trace peak heights. Local base composition and nearby peak heights account for 80% of the variations in peak heights. Classification algorithms were developed to identify 37% of automated base calls that differ from the consensus sequence. With these algorithms, 12% of the base calls had confidence levels less than 90%. 16 refs., 7 figs., 3 tabs.

Authors:
 [1];  [2];  [3];  [4];  [5]
  1. Affymetrix, Santa Clara, CA (United States)
  2. Daniel H. Wagner Associates, Sunnyvale, CA (United States)
  3. Applied Biosystems, Inc., Foster City, CA (United States)
  4. Univ. of California, Berkeley, CA (United States)
  5. Stanford Univ., CA (United States)
Publication Date:
OSTI Identifier:
6482756
Resource Type:
Journal Article
Journal Name:
Genomics; (United States)
Additional Journal Information:
Journal Volume: 19:3; Journal ID: ISSN 0888-7543
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; AUTOMATION; ALGORITHMS; DNA SEQUENCING; MATHEMATICAL LOGIC; STRUCTURAL CHEMICAL ANALYSIS; 550400* - Genetics; 550200 - Biochemistry; 990200 - Mathematics & Computers

Citation Formats

Lipshutz, R J, Taverner, F, Hennessy, K, Hartzell, G, and Davis, R. DNA sequence confidence estimation. United States: N. p., 1994. Web. doi:10.1006/geno.1994.1089.
Lipshutz, R J, Taverner, F, Hennessy, K, Hartzell, G, & Davis, R. DNA sequence confidence estimation. United States. https://doi.org/10.1006/geno.1994.1089
Lipshutz, R J, Taverner, F, Hennessy, K, Hartzell, G, and Davis, R. 1994. "DNA sequence confidence estimation". United States. https://doi.org/10.1006/geno.1994.1089.
@article{osti_6482756,
title = {DNA sequence confidence estimation},
author = {Lipshutz, R J and Taverner, F and Hennessy, K and Hartzell, G and Davis, R},
abstractNote = {A significant bottleneck in the current DNA sequencing process is the manual editing of trace data generated by automated DNA sequencers. This step is used to correct base calls and to associate to each base call a confidence level. The confidence levels are used in the assembly process to determine overlaps and to resolve discrepancies in determining the consensus sequence. This single step may cost as much as 4 to 8 cents per finished base. The authors report an approach to automated trace editing using classification trees to detect and exploit context-based patterns in trace peak heights. Local base composition and nearby peak heights account for 80% of the variations in peak heights. Classification algorithms were developed to identify 37% of automated base calls that differ from the consensus sequence. With these algorithms, 12% of the base calls had confidence levels less than 90%. 16 refs., 7 figs., 3 tabs.},
doi = {10.1006/geno.1994.1089},
url = {https://www.osti.gov/biblio/6482756}, journal = {Genomics; (United States)},
issn = {0888-7543},
number = ,
volume = 19:3,
place = {United States},
year = {Tue Feb 01 00:00:00 EST 1994},
month = {Tue Feb 01 00:00:00 EST 1994}
}