DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: BBMerge – Accurate paired shotgun read merging via overlap

Abstract

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.

Authors:
; ; ORCiD logo;
Publication Date:
Research Org.:
USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1405052
Alternate Identifier(s):
OSTI ID: 1407458
Report Number(s):
NREL/JA-2C00-70437
Journal ID: ISSN 1932-6203; 10.1371/journal.pone.0185056
Grant/Contract Number:  
AC02-05CH11231; AC36-08GO28308
Resource Type:
Published Article
Journal Name:
PLoS ONE
Additional Journal Information:
Journal Name: PLoS ONE Journal Volume: 12 Journal Issue: 10; Journal ID: ISSN 1932-6203
Publisher:
Public Library of Science (PLoS)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; metagenomics; sequence alignment; shotgun sequencing; sequence assembly tools; computer software; bacteria; genome annotation

Citation Formats

Bushnell, Brian, Rood, Jonathan, Singer, Esther, and Biggs, ed., Patrick Jon. BBMerge – Accurate paired shotgun read merging via overlap. United States: N. p., 2017. Web. doi:10.1371/journal.pone.0185056.
Bushnell, Brian, Rood, Jonathan, Singer, Esther, & Biggs, ed., Patrick Jon. BBMerge – Accurate paired shotgun read merging via overlap. United States. https://doi.org/10.1371/journal.pone.0185056
Bushnell, Brian, Rood, Jonathan, Singer, Esther, and Biggs, ed., Patrick Jon. Thu . "BBMerge – Accurate paired shotgun read merging via overlap". United States. https://doi.org/10.1371/journal.pone.0185056.
@article{osti_1405052,
title = {BBMerge – Accurate paired shotgun read merging via overlap},
author = {Bushnell, Brian and Rood, Jonathan and Singer, Esther and Biggs, ed., Patrick Jon},
abstractNote = {Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.},
doi = {10.1371/journal.pone.0185056},
journal = {PLoS ONE},
number = 10,
volume = 12,
place = {United States},
year = {Thu Oct 26 00:00:00 EDT 2017},
month = {Thu Oct 26 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1371/journal.pone.0185056

Citation Metrics:
Cited by: 590 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

IMG/M: a data management and analysis system for metagenomes
journal, December 2007

  • Markowitz, V. M.; Ivanova, N. N.; Szeto, E.
  • Nucleic Acids Research, Vol. 36, Issue Database
  • DOI: 10.1093/nar/gkm869

PEAR: a fast and accurate Illumina Paired-End reAd mergeR
journal, October 2013


QUAST: quality assessment tool for genome assemblies
journal, February 2013


Initial sequencing and analysis of the human genome
journal, February 2001

  • Lander, Eric S.; Linton, Lauren M.
  • Nature, Vol. 409, Issue 6822, p. 860-921
  • DOI: 10.1038/35057062

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
journal, October 2012


ELOPER: elongation of paired-end reads as a pre-processing tool for improved de novo genome assembly
journal, April 2013


SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012

  • Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
  • Journal of Computational Biology, Vol. 19, Issue 5
  • DOI: 10.1089/cmb.2012.0021

Next generation sequencing data of a defined microbial mock community
journal, September 2016

  • Singer, Esther; Andreopoulos, Bill; Bowers, Robert M.
  • Scientific Data, Vol. 3, Issue 1
  • DOI: 10.1038/sdata.2016.81

Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome
journal, October 2007


The new paradigm of flow cell sequencing
journal, May 2008


leeHom: adaptor trimming and merging for Illumina sequencing reads
journal, August 2014

  • Renaud, Gabriel; Stenzel, Udo; Kelso, Janet
  • Nucleic Acids Research, Vol. 42, Issue 18
  • DOI: 10.1093/nar/gku699

FLASH: fast length adjustment of short reads to improve genome assemblies
journal, September 2011


Scanning the human genome at kilobase resolution
journal, February 2008


PANDAseq: paired-end assembler for illumina sequences
journal, January 2012

  • Masella, Andre P.; Bartram, Andrea K.; Truszkowski, Jakub M.
  • BMC Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2105-13-31

Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome
journal, September 2005


Search and clustering orders of magnitude faster than BLAST
journal, August 2010


Gut Microbiome Metagenomics Analysis Suggests a Functional Model for the Development of Autoimmunity for Type 1 Diabetes
journal, October 2011


Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation
journal, January 2005

  • Ng, Patrick; Wei, Chia-Lin; Sung, Wing-Kin
  • Nature Methods, Vol. 2, Issue 2
  • DOI: 10.1038/nmeth733