ADEPT, a dynamic next generation sequencing data error-detection program with trimming
Abstract
Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.
- Authors:
-
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1248578
- Report Number(s):
- LA-UR-14-25592
Journal ID: ISSN 1471-2105; PII: 967
- Grant/Contract Number:
- AC02-05CH11231; AC52-06NA25396; CB10152; Y1-DE-6006-02; HSHQDC08X00790; B104153I; B084531I
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Bioinformatics
- Additional Journal Information:
- Journal Volume: 17; Journal Issue: 1; Journal ID: ISSN 1471-2105
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; Next generation sequencing; Illumina error prediction; Local quality scores; Position-specific quality
Citation Formats
Feng, Shihai, Lo, Chien-Chi, Li, Po-E, and Chain, Patrick S. G. ADEPT, a dynamic next generation sequencing data error-detection program with trimming. United States: N. p., 2016.
Web. doi:10.1186/s12859-016-0967-z.
Feng, Shihai, Lo, Chien-Chi, Li, Po-E, & Chain, Patrick S. G. ADEPT, a dynamic next generation sequencing data error-detection program with trimming. United States. https://doi.org/10.1186/s12859-016-0967-z
Feng, Shihai, Lo, Chien-Chi, Li, Po-E, and Chain, Patrick S. G. Mon .
"ADEPT, a dynamic next generation sequencing data error-detection program with trimming". United States. https://doi.org/10.1186/s12859-016-0967-z. https://www.osti.gov/servlets/purl/1248578.
@article{osti_1248578,
title = {ADEPT, a dynamic next generation sequencing data error-detection program with trimming},
author = {Feng, Shihai and Lo, Chien-Chi and Li, Po-E and Chain, Patrick S. G.},
abstractNote = {Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.},
doi = {10.1186/s12859-016-0967-z},
journal = {BMC Bioinformatics},
number = 1,
volume = 17,
place = {United States},
year = {Mon Feb 29 00:00:00 EST 2016},
month = {Mon Feb 29 00:00:00 EST 2016}
}
Web of Science
Figures / Tables:
Works referenced in this record:
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data
journal, September 2010
- Cox, Murray P.; Peterson, Daniel A.; Biggs, Patrick J.
- BMC Bioinformatics, Vol. 11, Issue 1
Sequencing technologies — the next generation
journal, December 2009
- Metzker, Michael L.
- Nature Reviews Genetics, Vol. 11, Issue 1
HiTEC: accurate error correction in high-throughput sequencing data
journal, November 2010
- Ilie, L.; Fazayeli, F.; Ilie, S.
- Bioinformatics, Vol. 27, Issue 3
ConDeTri - A Content Dependent Read Trimmer for Illumina Data
journal, October 2011
- Smeds, Linnéa; Künstner, Axel
- PLoS ONE, Vol. 6, Issue 10
SHREC: a short-read error correction method
journal, June 2009
- Schroder, J.; Schroder, H.; Puglisi, S. J.
- Bioinformatics, Vol. 25, Issue 17
Fast and accurate short read alignment with Burrows-Wheeler transform
journal, May 2009
- Li, H.; Durbin, R.
- Bioinformatics, Vol. 25, Issue 14
Quake: quality-aware detection and correction of sequencing errors
journal, January 2010
- Kelley, David R.; Schatz, Michael C.; Salzberg, Steven L.
- Genome Biology, Vol. 11, Issue 11
Rapid evaluation and quality control of next generation sequencing data with FaQCs
journal, November 2014
- Lo, Chien-Chi; Chain, Patrick S. G.
- BMC Bioinformatics, Vol. 15, Issue 1
A survey of error-correction methods for next-generation sequencing
journal, April 2012
- Yang, X.; Chockalingam, S. P.; Aluru, S.
- Briefings in Bioinformatics, Vol. 14, Issue 1
Correction of sequencing errors in a mixed set of reads
journal, April 2010
- Salmela, L.
- Bioinformatics, Vol. 26, Issue 10
The impact of next-generation sequencing technology on genetics
journal, March 2008
- Mardis, Elaine R.
- Trends in Genetics, Vol. 24, Issue 3
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
journal, August 2008
- Dohm, J. C.; Lottaz, C.; Borodina, T.
- Nucleic Acids Research, Vol. 36, Issue 16
Sequencing technologies — the next generation
journal, December 2009
- Metzker, Michael L.
- Nature Reviews Genetics, Vol. 11, Issue 1
Targeted A-to-G base editing of chloroplast DNA in plants
journal, December 2022
- Mok, Young Geun; Hong, Sunghyun; Bae, Su-Ji
- Nature Plants, Vol. 8, Issue 12
A survey of error-correction methods for next-generation sequencing
journal, April 2012
- Yang, X.; Chockalingam, S. P.; Aluru, S.
- Briefings in Bioinformatics, Vol. 14, Issue 1
SHREC: a short-read error correction method
journal, June 2009
- Schroder, J.; Schroder, H.; Puglisi, S. J.
- Bioinformatics, Vol. 25, Issue 17
Correction of sequencing errors in a mixed set of reads
journal, April 2010
- Salmela, L.
- Bioinformatics, Vol. 26, Issue 10
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data
journal, September 2010
- Cox, Murray P.; Peterson, Daniel A.; Biggs, Patrick J.
- BMC Bioinformatics, Vol. 11, Issue 1
Rapid evaluation and quality control of next generation sequencing data with FaQCs
journal, November 2014
- Lo, Chien-Chi; Chain, Patrick S. G.
- BMC Bioinformatics, Vol. 15, Issue 1
ConDeTri - A Content Dependent Read Trimmer for Illumina Data
journal, October 2011
- Smeds, Linnéa; Künstner, Axel
- PLoS ONE, Vol. 6, Issue 10
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.
text, January 2008
- Dohm, Juliane C.; Lottaz, Claudio; Borodina, Tatiana
- Universität Regensburg
Figures / Tables found in this record: