BBMerge – Accurate paired shotgun read merging via overlap
Abstract
Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.
- Authors:
- Publication Date:
- Research Org.:
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); National Renewable Energy Laboratory (NREL), Golden, CO (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1405052
- Alternate Identifier(s):
- OSTI ID: 1407458
- Report Number(s):
- NREL/JA-2C00-70437
Journal ID: ISSN 1932-6203; 10.1371/journal.pone.0185056
- Grant/Contract Number:
- AC02-05CH11231; AC36-08GO28308
- Resource Type:
- Published Article
- Journal Name:
- PLoS ONE
- Additional Journal Information:
- Journal Name: PLoS ONE Journal Volume: 12 Journal Issue: 10; Journal ID: ISSN 1932-6203
- Publisher:
- Public Library of Science (PLoS)
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; metagenomics; sequence alignment; shotgun sequencing; sequence assembly tools; computer software; bacteria; genome annotation
Citation Formats
Bushnell, Brian, Rood, Jonathan, Singer, Esther, and Biggs, ed., Patrick Jon. BBMerge – Accurate paired shotgun read merging via overlap. United States: N. p., 2017.
Web. doi:10.1371/journal.pone.0185056.
Bushnell, Brian, Rood, Jonathan, Singer, Esther, & Biggs, ed., Patrick Jon. BBMerge – Accurate paired shotgun read merging via overlap. United States. https://doi.org/10.1371/journal.pone.0185056
Bushnell, Brian, Rood, Jonathan, Singer, Esther, and Biggs, ed., Patrick Jon. Thu .
"BBMerge – Accurate paired shotgun read merging via overlap". United States. https://doi.org/10.1371/journal.pone.0185056.
@article{osti_1405052,
title = {BBMerge – Accurate paired shotgun read merging via overlap},
author = {Bushnell, Brian and Rood, Jonathan and Singer, Esther and Biggs, ed., Patrick Jon},
abstractNote = {Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.},
doi = {10.1371/journal.pone.0185056},
journal = {PLoS ONE},
number = 10,
volume = 12,
place = {United States},
year = {Thu Oct 26 00:00:00 EDT 2017},
month = {Thu Oct 26 00:00:00 EDT 2017}
}
https://doi.org/10.1371/journal.pone.0185056
Web of Science
Works referenced in this record:
IMG/M: a data management and analysis system for metagenomes
journal, December 2007
- Markowitz, V. M.; Ivanova, N. N.; Szeto, E.
- Nucleic Acids Research, Vol. 36, Issue Database
PEAR: a fast and accurate Illumina Paired-End reAd mergeR
journal, October 2013
- Zhang, J.; Kobert, K.; Flouri, T.
- Bioinformatics, Vol. 30, Issue 5
QUAST: quality assessment tool for genome assemblies
journal, February 2013
- Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay
- Bioinformatics, Vol. 29, Issue 8
Initial sequencing and analysis of the human genome
journal, February 2001
- Lander, Eric S.; Linton, Lauren M.
- Nature, Vol. 409, Issue 6822, p. 860-921
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
journal, October 2012
- Liu, B.; Yuan, J.; Yiu, S. -M.
- Bioinformatics, Vol. 28, Issue 22
ELOPER: elongation of paired-end reads as a pre-processing tool for improved de novo genome assembly
journal, April 2013
- Silver, David H.; Ben-Elazar, Shay; Bogoslavsky, Alexei
- Bioinformatics, Vol. 29, Issue 11
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012
- Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
- Journal of Computational Biology, Vol. 19, Issue 5
Next generation sequencing data of a defined microbial mock community
journal, September 2016
- Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
- Scientific Data, Vol. 3, Issue 1
Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome
journal, October 2007
- Korbel, J. O.; Urban, A. E.; Affourtit, J. P.
- Science, Vol. 318, Issue 5849
The new paradigm of flow cell sequencing
journal, May 2008
- Holt, R. A.; Jones, S. J. M.
- Genome Research, Vol. 18, Issue 6
leeHom: adaptor trimming and merging for Illumina sequencing reads
journal, August 2014
- Renaud, Gabriel; Stenzel, Udo; Kelso, Janet
- Nucleic Acids Research, Vol. 42, Issue 18
FLASH: fast length adjustment of short reads to improve genome assemblies
journal, September 2011
- Magoc, T.; Salzberg, S. L.
- Bioinformatics, Vol. 27, Issue 21
Scanning the human genome at kilobase resolution
journal, February 2008
- Chen, J.; Kim, Y. C.; Jung, Y. -C.
- Genome Research, Vol. 18, Issue 5
PANDAseq: paired-end assembler for illumina sequences
journal, January 2012
- Masella, Andre P.; Bartram, Andrea K.; Truszkowski, Jakub M.
- BMC Bioinformatics, Vol. 13, Issue 1
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome
journal, September 2005
- Shendure, J.
- Science, Vol. 309, Issue 5741
Search and clustering orders of magnitude faster than BLAST
journal, August 2010
- Edgar, Robert C.
- Bioinformatics, Vol. 26, Issue 19, p. 2460-2461
Gut Microbiome Metagenomics Analysis Suggests a Functional Model for the Development of Autoimmunity for Type 1 Diabetes
journal, October 2011
- Brown, Christopher T.; Davis-Richardson, Austin G.; Giongo, Adriana
- PLoS ONE, Vol. 6, Issue 10
Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation
journal, January 2005
- Ng, Patrick; Wei, Chia-Lin; Sung, Wing-Kin
- Nature Methods, Vol. 2, Issue 2