De novo Nanopore read quality improvement using deep learning
Abstract
BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as amore »
- Authors:
-
- Univ. of California, Los Angeles, CA (United States)
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Merced, CA (United States)
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); National Science Foundation (NSF); National Institutes of Health (NIH)
- OSTI Identifier:
- 1581387
- Grant/Contract Number:
- AC02-05CH11231; DGE-1829071; T3EB016640
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Bioinformatics
- Additional Journal Information:
- Journal Volume: 20; Journal Issue: 1; Journal ID: ISSN 1471-2105
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; deep learning; long sequence reads; Oxford Nanopore; de novo assembly
Citation Formats
LaPierre, Nathan, Egan, Rob, Wang, Wei, and Wang, Zhong. De novo Nanopore read quality improvement using deep learning. United States: N. p., 2019.
Web. doi:10.1186/s12859-019-3103-z.
LaPierre, Nathan, Egan, Rob, Wang, Wei, & Wang, Zhong. De novo Nanopore read quality improvement using deep learning. United States. https://doi.org/10.1186/s12859-019-3103-z
LaPierre, Nathan, Egan, Rob, Wang, Wei, and Wang, Zhong. Wed .
"De novo Nanopore read quality improvement using deep learning". United States. https://doi.org/10.1186/s12859-019-3103-z. https://www.osti.gov/servlets/purl/1581387.
@article{osti_1581387,
title = {De novo Nanopore read quality improvement using deep learning},
author = {LaPierre, Nathan and Egan, Rob and Wang, Wei and Wang, Zhong},
abstractNote = {BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub .},
doi = {10.1186/s12859-019-3103-z},
journal = {BMC Bioinformatics},
number = 1,
volume = 20,
place = {United States},
year = {Wed Nov 06 00:00:00 EST 2019},
month = {Wed Nov 06 00:00:00 EST 2019}
}
Web of Science
Works referenced in this record:
QUAST: quality assessment tool for genome assemblies
journal, February 2013
- Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay
- Bioinformatics, Vol. 29, Issue 8
Hybrid error correction and de novo assembly of single-molecule sequencing reads
journal, July 2012
- Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.
- Nature Biotechnology, Vol. 30, Issue 7
Deep learning
journal, May 2015
- LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
- Nature, Vol. 521, Issue 7553
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome
journal, October 2015
- Goodwin, Sara; Gurtowski, James; Ethe-Sayers, Scott
- Genome Research, Vol. 25, Issue 11
A first look at the Oxford Nanopore MinION sequencer
journal, September 2014
- Mikheyev, Alexander S.; Tin, Mandy M. Y.
- Molecular Ecology Resources, Vol. 14, Issue 6
LoRDEC: accurate and efficient long read error correction
journal, August 2014
- Salmela, Leena; Rivals, Eric
- Bioinformatics, Vol. 30, Issue 24
Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation
journal, March 2017
- Koren, Sergey; Walenz, Brian P.; Berlin, Konstantin
- Genome Research, Vol. 27, Issue 5
AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads
journal, January 2012
- Lindgreen, Stinus
- BMC Research Notes, Vol. 5, Issue 1
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
journal, May 2013
- Chin, Chen-Shan; Alexander, David H.; Marks, Patrick
- Nature Methods, Vol. 10, Issue 6
MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads
journal, September 2017
- Xiao, Chuan-Le; Chen, Ying; Xie, Shang-Qian
- Nature Methods, Vol. 14, Issue 11
A world of opportunities with nanopore sequencing
journal, August 2017
- Leggett, Richard M.; Clark, Matthew D.
- Journal of Experimental Botany, Vol. 68, Issue 20
Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014
- Bolger, Anthony M.; Lohse, Marc; Usadel, Bjoern
- Bioinformatics, Vol. 30, Issue 15
Fast and sensitive mapping of nanopore sequencing reads with GraphMap
journal, April 2016
- Sović, Ivan; Šikić, Mile; Wilm, Andreas
- Nature Communications, Vol. 7, Issue 1
ImageNet Large Scale Visual Recognition Challenge
journal, April 2015
- Russakovsky, Olga; Deng, Jia; Su, Hao
- International Journal of Computer Vision, Vol. 115, Issue 3
PacBio Sequencing and Its Applications
journal, October 2015
- Rhoads, Anthony; Au, Kin Fai
- Genomics, Proteomics & Bioinformatics, Vol. 13, Issue 5
Exploring genome characteristics and sequence quality without a reference
journal, January 2014
- Simpson, Jared T.
- Bioinformatics, Vol. 30, Issue 9
Defining a personal, allele-specific, and single-molecule long-read transcriptome
journal, June 2014
- Tilgner, Hagen; Grubert, Fabian; Sharon, Donald
- Proceedings of the National Academy of Sciences, Vol. 111, Issue 27
Reducing storage requirements for biological sequence comparison
journal, July 2004
- Roberts, M.; Hayes, W.; Hunt, B. R.
- Bioinformatics, Vol. 20, Issue 18
Nanocall: an open source basecaller for Oxford Nanopore sequencing data
journal, September 2016
- David, Matei; Dursi, L. J.; Yao, Delia
- Bioinformatics, Vol. 33, Issue 1
Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes
journal, July 2015
- Judge, Kim; Harris, Simon R.; Reuter, Sandra
- Journal of Antimicrobial Chemotherapy, Vol. 70, Issue 10
Next generation sequencing data of a defined microbial mock community
journal, September 2016
- Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
- Scientific Data, Vol. 3, Issue 1
A universal SNP and small-indel variant caller using deep neural networks
journal, September 2018
- Poplin, Ryan; Chang, Pi-Chuan; Alexander, David
- Nature Biotechnology, Vol. 36, Issue 10
Assembly complexity of prokaryotic genomes using short reads
journal, January 2010
- Kingsford, Carl; Schatz, Michael C.; Pop, Mihai
- BMC Bioinformatics, Vol. 11, Issue 1
DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads
journal, June 2017
- Boža, Vladimír; Brejová, Broňa; Vinař, Tomáš
- PLOS ONE, Vol. 12, Issue 6
Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
journal, November 2012
- English, Adam C.; Richards, Stephen; Han, Yi
- PLoS ONE, Vol. 7, Issue 11
Assessing the performance of the Oxford Nanopore Technologies MinION
journal, March 2015
- Laver, T.; Harrison, J.; O’Neill, P. A.
- Biomolecular Detection and Quantification, Vol. 3
Mutations in virus-derived small RNAs
journal, June 2020
- Nigam, Deepti; LaTourrette, Katherine; Garcia-Ruiz, Hernan
- Scientific Reports, Vol. 10, Issue 1
Characterization and functional analysis of phytoene synthase gene family in tobacco
journal, January 2021
- Wang, Zhaojun; Zhang, Lin; Dong, Chen
- BMC Plant Biology, Vol. 21, Issue 1
Charting the genomic landscape of seed-free plants
text, January 2021
- Péter, Szövényi,; Andika, Gunadi,; Fay-Wei, Li,
- Nature Publishing Group
ImageNet Large Scale Visual Recognition Challenge
text, January 2015
- Jia, Deng,; Andrej, Karpathy,; Sean, Ma,
- The University of North Carolina at Chapel Hill University Libraries
Exploring Genome Characteristics and Sequence Quality Without a Reference
preprint, January 2013
- Simpson, Jared T.
- arXiv
Assessing the performance of the Oxford Nanopore Technologies MinION
journal, March 2015
- Laver, T.; Harrison, J.; O’Neill, P. A.
- Biomolecular Detection and Quantification, Vol. 3
A universal SNP and small-indel variant caller using deep neural networks
journal, September 2018
- Poplin, Ryan; Chang, Pi-Chuan; Alexander, David
- Nature Biotechnology, Vol. 36, Issue 10
Fast and sensitive mapping of nanopore sequencing reads with GraphMap
journal, April 2016
- Sović, Ivan; Šikić, Mile; Wilm, Andreas
- Nature Communications, Vol. 7, Issue 1
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
journal, May 2013
- Chin, Chen-Shan; Alexander, David H.; Marks, Patrick
- Nature Methods, Vol. 10, Issue 6
Next generation sequencing data of a defined microbial mock community
journal, September 2016
- Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
- Scientific Data, Vol. 3, Issue 1
Reducing storage requirements for biological sequence comparison
journal, July 2004
- Roberts, M.; Hayes, W.; Hunt, B. R.
- Bioinformatics, Vol. 20, Issue 18
QUAST: quality assessment tool for genome assemblies
journal, February 2013
- Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay
- Bioinformatics, Vol. 29, Issue 8
Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014
- Bolger, Anthony M.; Lohse, Marc; Usadel, Bjoern
- Bioinformatics, Vol. 30, Issue 15
LoRDEC: accurate and efficient long read error correction
journal, August 2014
- Salmela, Leena; Rivals, Eric
- Bioinformatics, Vol. 30, Issue 24
A first look at the Oxford Nanopore MinION sequencer
journal, September 2014
- Mikheyev, Alexander S.; Tin, Mandy M. Y.
- Molecular Ecology Resources, Vol. 14, Issue 6
AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads
journal, January 2012
- Lindgreen, Stinus
- BMC Research Notes, Vol. 5, Issue 1
DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads
text, January 2016
- Boža, Vladimír; Brejová, Broňa; Vinař, Tomáš
- arXiv