DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ADEPT, a dynamic next generation sequencing data error-detection program with trimming

Journal Article · · BMC Bioinformatics
 [1];  [1];  [1];  [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-05CH11231; AC52-06NA25396; CB10152; Y1-DE-6006-02; HSHQDC08X00790; B104153I; B084531I
OSTI ID:
1248578
Report Number(s):
LA-UR-14-25592; PII: 967
Journal Information:
BMC Bioinformatics, Vol. 17, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

References (21)

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data journal September 2010
Sequencing technologies — the next generation journal December 2009
HiTEC: accurate error correction in high-throughput sequencing data journal November 2010
ConDeTri - A Content Dependent Read Trimmer for Illumina Data journal October 2011
SHREC: a short-read error correction method journal June 2009
Fast and accurate short read alignment with Burrows-Wheeler transform journal May 2009
Quake: quality-aware detection and correction of sequencing errors journal January 2010
Rapid evaluation and quality control of next generation sequencing data with FaQCs journal November 2014
A survey of error-correction methods for next-generation sequencing journal April 2012
Correction of sequencing errors in a mixed set of reads journal April 2010
The impact of next-generation sequencing technology on genetics journal March 2008
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing journal August 2008
Sequencing technologies — the next generation journal December 2009
Targeted A-to-G base editing of chloroplast DNA in plants journal December 2022
A survey of error-correction methods for next-generation sequencing journal April 2012
SHREC: a short-read error correction method journal June 2009
Correction of sequencing errors in a mixed set of reads journal April 2010
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data journal September 2010
Rapid evaluation and quality control of next generation sequencing data with FaQCs journal November 2014
ConDeTri - A Content Dependent Read Trimmer for Illumina Data journal October 2011
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. text January 2008


Figures / Tables (4)