skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: BBMerge – Accurate paired shotgun read merging via overlap

Abstract

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.

Authors:
 [1];  [2]; ORCiD logo [1]
  1. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  2. National Renewable Energy Lab. (NREL), Golden, CO (United States)
Publication Date:
Research Org.:
USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); National Renewable Energy Lab. (NREL), Golden, CO (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1405052
Alternate Identifier(s):
OSTI ID: 1407458
Report Number(s):
NREL/JA-2C00-70437
Journal ID: ISSN 1932-6203
Grant/Contract Number:
AC02-05CH11231; AC36-08GO28308
Resource Type:
Journal Article: Published Article
Journal Name:
PLoS ONE
Additional Journal Information:
Journal Volume: 12; Journal Issue: 10; Journal ID: ISSN 1932-6203
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; metagenomics; sequence alignment; shotgun sequencing; sequence assembly tools; computer software; bacteria; genome annotation

Citation Formats

Bushnell, Brian, Rood, Jonathan, and Singer, Esther. BBMerge – Accurate paired shotgun read merging via overlap. United States: N. p., 2017. Web. doi:10.1371/journal.pone.0185056.
Bushnell, Brian, Rood, Jonathan, & Singer, Esther. BBMerge – Accurate paired shotgun read merging via overlap. United States. doi:10.1371/journal.pone.0185056.
Bushnell, Brian, Rood, Jonathan, and Singer, Esther. Thu . "BBMerge – Accurate paired shotgun read merging via overlap". United States. doi:10.1371/journal.pone.0185056.
@article{osti_1405052,
title = {BBMerge – Accurate paired shotgun read merging via overlap},
author = {Bushnell, Brian and Rood, Jonathan and Singer, Esther},
abstractNote = {Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.},
doi = {10.1371/journal.pone.0185056},
journal = {PLoS ONE},
number = 10,
volume = 12,
place = {United States},
year = {Thu Oct 26 00:00:00 EDT 2017},
month = {Thu Oct 26 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1371/journal.pone.0185056

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share: