skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: BBMerge – Accurate paired shotgun read merging via overlap

Journal Article · · PLoS ONE

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.

Research Organization:
USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231; AC36-08GO28308
OSTI ID:
1405052
Alternate ID(s):
OSTI ID: 1407458
Report Number(s):
NREL/JA-2C00-70437; 10.1371/journal.pone.0185056
Journal Information:
PLoS ONE, Journal Name: PLoS ONE Vol. 12 Journal Issue: 10; ISSN 1932-6203
Publisher:
Public Library of Science (PLoS)Copyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 590 works
Citation information provided by
Web of Science

References (18)

IMG/M: a data management and analysis system for metagenomes journal December 2007
PEAR: a fast and accurate Illumina Paired-End reAd mergeR journal October 2013
QUAST: quality assessment tool for genome assemblies journal February 2013
Initial sequencing and analysis of the human genome journal February 2001
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly journal October 2012
ELOPER: elongation of paired-end reads as a pre-processing tool for improved de novo genome assembly journal April 2013
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
Next generation sequencing data of a defined microbial mock community journal September 2016
Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome journal October 2007
The new paradigm of flow cell sequencing journal May 2008
leeHom: adaptor trimming and merging for Illumina sequencing reads journal August 2014
FLASH: fast length adjustment of short reads to improve genome assemblies journal September 2011
Scanning the human genome at kilobase resolution journal February 2008
PANDAseq: paired-end assembler for illumina sequences journal January 2012
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome journal September 2005
Search and clustering orders of magnitude faster than BLAST journal August 2010
Gut Microbiome Metagenomics Analysis Suggests a Functional Model for the Development of Autoimmunity for Type 1 Diabetes journal October 2011
Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation journal January 2005