DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: De novo Nanopore read quality improvement using deep learning

Abstract

BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as amore » tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub .« less

Authors:
 [1];  [2];  [1]; ORCiD logo [3]
  1. Univ. of California, Los Angeles, CA (United States)
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Merced, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF); National Institutes of Health (NIH)
OSTI Identifier:
1581387
Grant/Contract Number:  
AC02-05CH11231; DGE-1829071; T3EB016640
Resource Type:
Accepted Manuscript
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 20; Journal Issue: 1; Journal ID: ISSN 1471-2105
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; deep learning; long sequence reads; Oxford Nanopore; de novo assembly

Citation Formats

LaPierre, Nathan, Egan, Rob, Wang, Wei, and Wang, Zhong. De novo Nanopore read quality improvement using deep learning. United States: N. p., 2019. Web. doi:10.1186/s12859-019-3103-z.
LaPierre, Nathan, Egan, Rob, Wang, Wei, & Wang, Zhong. De novo Nanopore read quality improvement using deep learning. United States. https://doi.org/10.1186/s12859-019-3103-z
LaPierre, Nathan, Egan, Rob, Wang, Wei, and Wang, Zhong. Wed . "De novo Nanopore read quality improvement using deep learning". United States. https://doi.org/10.1186/s12859-019-3103-z. https://www.osti.gov/servlets/purl/1581387.
@article{osti_1581387,
title = {De novo Nanopore read quality improvement using deep learning},
author = {LaPierre, Nathan and Egan, Rob and Wang, Wei and Wang, Zhong},
abstractNote = {BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub .},
doi = {10.1186/s12859-019-3103-z},
journal = {BMC Bioinformatics},
number = 1,
volume = 20,
place = {United States},
year = {Wed Nov 06 00:00:00 EST 2019},
month = {Wed Nov 06 00:00:00 EST 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

QUAST: quality assessment tool for genome assemblies
journal, February 2013


Hybrid error correction and de novo assembly of single-molecule sequencing reads
journal, July 2012

  • Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.
  • Nature Biotechnology, Vol. 30, Issue 7
  • DOI: 10.1038/nbt.2280

Deep learning
journal, May 2015

  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
  • Nature, Vol. 521, Issue 7553
  • DOI: 10.1038/nature14539

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome
journal, October 2015

  • Goodwin, Sara; Gurtowski, James; Ethe-Sayers, Scott
  • Genome Research, Vol. 25, Issue 11
  • DOI: 10.1101/gr.191395.115

A first look at the Oxford Nanopore MinION sequencer
journal, September 2014

  • Mikheyev, Alexander S.; Tin, Mandy M. Y.
  • Molecular Ecology Resources, Vol. 14, Issue 6
  • DOI: 10.1111/1755-0998.12324

LoRDEC: accurate and efficient long read error correction
journal, August 2014


Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation
journal, March 2017

  • Koren, Sergey; Walenz, Brian P.; Berlin, Konstantin
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.215087.116

AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads
journal, January 2012


Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
journal, May 2013

  • Chin, Chen-Shan; Alexander, David H.; Marks, Patrick
  • Nature Methods, Vol. 10, Issue 6
  • DOI: 10.1038/nmeth.2474

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads
journal, September 2017

  • Xiao, Chuan-Le; Chen, Ying; Xie, Shang-Qian
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4432

A world of opportunities with nanopore sequencing
journal, August 2017

  • Leggett, Richard M.; Clark, Matthew D.
  • Journal of Experimental Botany, Vol. 68, Issue 20
  • DOI: 10.1093/jxb/erx289

Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014


Fast and sensitive mapping of nanopore sequencing reads with GraphMap
journal, April 2016

  • Sović, Ivan; Šikić, Mile; Wilm, Andreas
  • Nature Communications, Vol. 7, Issue 1
  • DOI: 10.1038/ncomms11307

ImageNet Large Scale Visual Recognition Challenge
journal, April 2015

  • Russakovsky, Olga; Deng, Jia; Su, Hao
  • International Journal of Computer Vision, Vol. 115, Issue 3
  • DOI: 10.1007/s11263-015-0816-y

PacBio Sequencing and Its Applications
journal, October 2015


Exploring genome characteristics and sequence quality without a reference
journal, January 2014


Defining a personal, allele-specific, and single-molecule long-read transcriptome
journal, June 2014

  • Tilgner, Hagen; Grubert, Fabian; Sharon, Donald
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 27
  • DOI: 10.1073/pnas.1400447111

Reducing storage requirements for biological sequence comparison
journal, July 2004


Nanocall: an open source basecaller for Oxford Nanopore sequencing data
journal, September 2016


Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes
journal, July 2015

  • Judge, Kim; Harris, Simon R.; Reuter, Sandra
  • Journal of Antimicrobial Chemotherapy, Vol. 70, Issue 10
  • DOI: 10.1093/jac/dkv206

Next generation sequencing data of a defined microbial mock community
journal, September 2016

  • Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
  • Scientific Data, Vol. 3, Issue 1
  • DOI: 10.1038/sdata.2016.81

A universal SNP and small-indel variant caller using deep neural networks
journal, September 2018

  • Poplin, Ryan; Chang, Pi-Chuan; Alexander, David
  • Nature Biotechnology, Vol. 36, Issue 10
  • DOI: 10.1038/nbt.4235

Assembly complexity of prokaryotic genomes using short reads
journal, January 2010

  • Kingsford, Carl; Schatz, Michael C.; Pop, Mihai
  • BMC Bioinformatics, Vol. 11, Issue 1
  • DOI: 10.1186/1471-2105-11-21

DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads
journal, June 2017


Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
journal, November 2012


Assessing the performance of the Oxford Nanopore Technologies MinION
journal, March 2015


Mutations in virus-derived small RNAs
journal, June 2020


Characterization and functional analysis of phytoene synthase gene family in tobacco
journal, January 2021


Charting the genomic landscape of seed-free plants
text, January 2021

  • Péter, Szövényi,; Andika, Gunadi,; Fay-Wei, Li,
  • Nature Publishing Group
  • DOI: 10.5167/uzh-203460

ImageNet Large Scale Visual Recognition Challenge
text, January 2015

  • Jia, Deng,; Andrej, Karpathy,; Sean, Ma,
  • The University of North Carolina at Chapel Hill University Libraries
  • DOI: 10.17615/009h-3a34

Deep Learning
text, January 2018


Assessing the performance of the Oxford Nanopore Technologies MinION
journal, March 2015


A universal SNP and small-indel variant caller using deep neural networks
journal, September 2018

  • Poplin, Ryan; Chang, Pi-Chuan; Alexander, David
  • Nature Biotechnology, Vol. 36, Issue 10
  • DOI: 10.1038/nbt.4235

Fast and sensitive mapping of nanopore sequencing reads with GraphMap
journal, April 2016

  • Sović, Ivan; Šikić, Mile; Wilm, Andreas
  • Nature Communications, Vol. 7, Issue 1
  • DOI: 10.1038/ncomms11307

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
journal, May 2013

  • Chin, Chen-Shan; Alexander, David H.; Marks, Patrick
  • Nature Methods, Vol. 10, Issue 6
  • DOI: 10.1038/nmeth.2474

Next generation sequencing data of a defined microbial mock community
journal, September 2016

  • Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
  • Scientific Data, Vol. 3, Issue 1
  • DOI: 10.1038/sdata.2016.81

Reducing storage requirements for biological sequence comparison
journal, July 2004


QUAST: quality assessment tool for genome assemblies
journal, February 2013


Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014


LoRDEC: accurate and efficient long read error correction
journal, August 2014


A first look at the Oxford Nanopore MinION sequencer
journal, September 2014

  • Mikheyev, Alexander S.; Tin, Mandy M. Y.
  • Molecular Ecology Resources, Vol. 14, Issue 6
  • DOI: 10.1111/1755-0998.12324

AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads
journal, January 2012


DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads
text, January 2016