skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: De novo Nanopore read quality improvement using deep learning

Journal Article · · BMC Bioinformatics
 [1];  [2];  [1]; ORCiD logo [3]
  1. Univ. of California, Los Angeles, CA (United States)
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Merced, CA (United States)

BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub .

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); National Science Foundation (NSF); National Institutes of Health (NIH)
Grant/Contract Number:
AC02-05CH11231; DGE-1829071; T3EB016640
OSTI ID:
1581387
Journal Information:
BMC Bioinformatics, Vol. 20, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

References (33)

QUAST: quality assessment tool for genome assemblies journal February 2013
Hybrid error correction and de novo assembly of single-molecule sequencing reads journal July 2012
Deep learning journal May 2015
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome journal October 2015
A first look at the Oxford Nanopore MinION sequencer journal September 2014
LoRDEC: accurate and efficient long read error correction journal August 2014
Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation journal March 2017
AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads journal January 2012
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data journal May 2013
MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads journal September 2017
A world of opportunities with nanopore sequencing journal August 2017
Trimmomatic: a flexible trimmer for Illumina sequence data journal April 2014
Fast and sensitive mapping of nanopore sequencing reads with GraphMap journal April 2016
ImageNet Large Scale Visual Recognition Challenge journal April 2015
PacBio Sequencing and Its Applications journal October 2015
Exploring genome characteristics and sequence quality without a reference journal January 2014
Defining a personal, allele-specific, and single-molecule long-read transcriptome journal June 2014
Reducing storage requirements for biological sequence comparison journal July 2004
Nanocall: an open source basecaller for Oxford Nanopore sequencing data journal September 2016
Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes journal July 2015
Next generation sequencing data of a defined microbial mock community journal September 2016
A universal SNP and small-indel variant caller using deep neural networks journal September 2018
Assembly complexity of prokaryotic genomes using short reads journal January 2010
DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads journal June 2017
Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology journal November 2012
Assessing the performance of the Oxford Nanopore Technologies MinION journal March 2015
Mutations in virus-derived small RNAs journal June 2020
Characterization and functional analysis of phytoene synthase gene family in tobacco journal January 2021
Charting the genomic landscape of seed-free plants text January 2021
ImageNet Large Scale Visual Recognition Challenge text January 2015
Deep Learning text January 2018
Exploring Genome Characteristics and Sequence Quality Without a Reference preprint January 2013
DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads text January 2016