De novo Nanopore read quality improvement using deep learning
Journal Article
·
· BMC Bioinformatics
- Univ. of California, Los Angeles, CA (United States)
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Merced, CA (United States)
BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub .
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- National Institutes of Health (NIH); National Science Foundation (NSF); USDOE Office of Science (SC)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1581387
- Journal Information:
- BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 20; ISSN 1471-2105
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Optimizing de novo genome assembly from PCR-amplified metagenomes
Extreme-Scale De Novo Genome Assembly
Optimizing de novo genome assembly from PCR-amplified metagenomes
Journal Article
·
Thu Dec 27 23:00:00 EST 2018
·
OSTI ID:1559171
Extreme-Scale De Novo Genome Assembly
Journal Article
·
Mon Sep 25 20:00:00 EDT 2017
·
OSTI ID:1398520
Optimizing de novo genome assembly from PCR-amplified metagenomes
Journal Article
·
Thu Dec 27 23:00:00 EST 2018
·
OSTI ID:1766468