skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

Abstract

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

Authors:
 [1];  [2];  [3];  [4]
  1. Univ. of Tennessee, Knoxville, TN (United States); Purdue Univ., West Lafayette, IN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1376350
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Frontiers in Microbiology
Additional Journal Information:
Journal Volume: 8; Journal ID: ISSN 1664-302X
Publisher:
Frontiers Research Foundation
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; PacBio; Illumina; genome assembly; next-generation sequencing (NGS); repetitive DNA; Pilon; circlator

Citation Formats

Utturkar, Sagar M., Klingeman, Dawn M., Hurt, Jr., Richard A., and Brown, Steven D.. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies. United States: N. p., 2017. Web. doi:10.3389/fmicb.2017.01272.
Utturkar, Sagar M., Klingeman, Dawn M., Hurt, Jr., Richard A., & Brown, Steven D.. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies. United States. doi:10.3389/fmicb.2017.01272.
Utturkar, Sagar M., Klingeman, Dawn M., Hurt, Jr., Richard A., and Brown, Steven D.. Tue . "A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies". United States. doi:10.3389/fmicb.2017.01272. https://www.osti.gov/servlets/purl/1376350.
@article{osti_1376350,
title = {A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies},
author = {Utturkar, Sagar M. and Klingeman, Dawn M. and Hurt, Jr., Richard A. and Brown, Steven D.},
abstractNote = {This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.},
doi = {10.3389/fmicb.2017.01272},
journal = {Frontiers in Microbiology},
number = ,
volume = 8,
place = {United States},
year = {Tue Jul 18 00:00:00 EDT 2017},
month = {Tue Jul 18 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

REBASE—a database for DNA restriction and modification: enzymes, genes and genomes
journal, November 2014

  • Roberts, Richard J.; Vincze, Tamas; Posfai, Janos
  • Nucleic Acids Research, Vol. 43, Issue D1, p. D298-D299
  • DOI: 10.1093/nar/gku1046

A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers
journal, January 2012


Genome sequencing in microfabricated high-density picolitre reactors
journal, July 2005

  • Margulies, Marcel; Egholm, Michael; Altman, William E.
  • Nature, Vol. 437, Issue 7057, p. 376-380
  • DOI: 10.1038/nature03959