skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

Abstract

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

Authors:
 [1];  [2];  [3];  [4]
  1. Univ. of Tennessee, Knoxville, TN (United States); Purdue Univ., West Lafayette, IN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1376350
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Frontiers in Microbiology
Additional Journal Information:
Journal Volume: 8; Journal ID: ISSN 1664-302X
Publisher:
Frontiers Research Foundation
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; PacBio; Illumina; genome assembly; next-generation sequencing (NGS); repetitive DNA; Pilon; circlator

Citation Formats

Utturkar, Sagar M., Klingeman, Dawn M., Hurt, Jr., Richard A., and Brown, Steven D. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies. United States: N. p., 2017. Web. doi:10.3389/fmicb.2017.01272.
Utturkar, Sagar M., Klingeman, Dawn M., Hurt, Jr., Richard A., & Brown, Steven D. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies. United States. doi:10.3389/fmicb.2017.01272.
Utturkar, Sagar M., Klingeman, Dawn M., Hurt, Jr., Richard A., and Brown, Steven D. Tue . "A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies". United States. doi:10.3389/fmicb.2017.01272. https://www.osti.gov/servlets/purl/1376350.
@article{osti_1376350,
title = {A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies},
author = {Utturkar, Sagar M. and Klingeman, Dawn M. and Hurt, Jr., Richard A. and Brown, Steven D.},
abstractNote = {This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.},
doi = {10.3389/fmicb.2017.01272},
journal = {Frontiers in Microbiology},
number = ,
volume = 8,
place = {United States},
year = {2017},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

QUAST: quality assessment tool for genome assemblies
journal, February 2013


REBASE—a database for DNA restriction and modification: enzymes, genes and genomes
journal, November 2014

  • Roberts, Richard J.; Vincze, Tamas; Posfai, Janos
  • Nucleic Acids Research, Vol. 43, Issue D1, p. D298-D299
  • DOI: 10.1093/nar/gku1046

Genome Sequence of Halomonas sp. Strain KO116, an Ionic Liquid-Tolerant Marine Bacterium Isolated from a Lignin-Enriched Seawater Microcosm
journal, May 2015

  • O’Dell, Kaela B.; Woo, Hannah L.; Utturkar, Sagar
  • Genome Announcements, Vol. 3, Issue 3
  • DOI: 10.1128/genomeA.00402-15

Complete Genome Sequence of Pelosinus fermentans JBW45, a Member of a Remarkably Competitive Group of Negativicutes in the Firmicutes Phylum
journal, September 2015

  • De León, Kara B.; Utturkar, Sagar M.; Camilleri, Laura B.
  • Genome Announcements, Vol. 3, Issue 5
  • DOI: 10.1128/genomeA.01090-15

Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens
journal, June 2016

  • Deschamps, Stéphane; Mudge, Joann; Cameron, Connor
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep28625

A hybrid approach for the automated finishing of bacterial genomes
journal, July 2012

  • Bashir, Ali; Klammer, Aaron A.; Robins, William P.
  • Nature Biotechnology, Vol. 30, Issue 7
  • DOI: 10.1038/nbt.2288

Genome Project Standards in a New Era of Sequencing
journal, October 2009


A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers
journal, January 2012


A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes
journal, September 2011

  • Haridas, Sajeet; Breuill, Colette; Bohlmann, Joerg
  • Journal of Microbiological Methods, Vol. 86, Issue 3
  • DOI: 10.1016/j.mimet.2011.06.019

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
journal, May 2013

  • Chin, Chen-Shan; Alexander, David H.; Marks, Patrick
  • Nature Methods, Vol. 10, Issue 6
  • DOI: 10.1038/nmeth.2474

Complete Genome Sequence of Bacillus thuringiensis Serovar Tolworthi Strain Pasteur Institute Standard
journal, July 2015


Mfold web server for nucleic acid folding and hybridization prediction
journal, July 2003


Next generation sequencing technology: Advances and applications
journal, October 2014

  • Buermans, H. P. J.; den Dunnen, J. T.
  • Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease, Vol. 1842, Issue 10
  • DOI: 10.1016/j.bbadis.2014.06.015

Near-Complete Genome Sequence of the Cellulolytic Bacterium Bacteroides ( Pseudobacteroides ) cellulosolvens ATCC 35603
journal, September 2015


A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data
journal, October 2014


Application of Long Sequence Reads To Improve Genomes for Clostridium thermocellum AD2, Clostridium thermocellum LQRI, and Pelosinus fermentans R7
journal, September 2016

  • Utturkar, Sagar M.; Bayer, Edward A.; Borovok, Ilya
  • Genome Announcements, Vol. 4, Issue 5
  • DOI: 10.1128/genomeA.01043-16

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Draft Genome Sequence of Erwinia tracheiphila , an Economically Important Bacterial Pathogen of Cucurbits
journal, June 2015


Clostridium paradoxum DSM 7308T contains multiple 16S rRNA genes with heterogeneous intervening sequences
journal, August 1996


Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies
journal, April 2015

  • Utturkar, Sagar M.; Klingeman, Dawn M.; Bruno-Barcena, José M.
  • Scientific Data, Vol. 2, Issue 1
  • DOI: 10.1038/sdata.2015.14

Sequence assembly demystified
journal, January 2013

  • Nagarajan, Niranjan; Pop, Mihai
  • Nature Reviews Genetics, Vol. 14, Issue 3
  • DOI: 10.1038/nrg3367

Complete Closed Genome Sequences of Three Bibersteinia trehalosi Nasopharyngeal Isolates from Cattle with Shipping Fever
journal, January 2014


Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology
journal, June 2010


PerPlot & PerScan: tools for analysis of DNA curvature-related periodicity in genomic nucleotide sequences
journal, November 2011

  • Mrázek, Jan; Chaudhari, Tejas; Basu, Aryabrata
  • Microbial Informatics and Experimentation, Vol. 1, Issue 1
  • DOI: 10.1186/2042-5783-1-13

Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data
journal, August 2016

  • Page, Andrew J.; De Silva, Nishadi; Hunt, Martin
  • Microbial Genomics, Vol. 2, Issue 8
  • DOI: 10.1099/mgen.0.000083

Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
journal, November 2012


HINGE: long-read assembly achieves optimal repeat resolution
journal, March 2017

  • Kamath, Govinda M.; Shomorony, Ilan; Xia, Fei
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.216465.116

Reducing assembly complexity of microbial genomes with single-molecule sequencing
journal, January 2013


Complete Genome Sequence of Pelosinus sp. Strain UFO1 Assembled Using Single-Molecule Real-Time DNA Sequencing Technology
journal, September 2014


Erratum: Repetitive DNA and next-generation sequencing: computational challenges and solutions
journal, January 2012

  • Treangen, Todd J.; Salzberg, Steven L.
  • Nature Reviews Genetics, Vol. 13, Issue 2
  • DOI: 10.1038/nrg3164

A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs
journal, June 2012

  • Swain, Martin T.; Tsai, Isheng J.; Assefa, Samual A.
  • Nature Protocols, Vol. 7, Issue 7
  • DOI: 10.1038/nprot.2012.068

Ten years of next-generation sequencing technology
journal, September 2014


Nanopore-based Fourth-generation DNA Sequencing Technology
journal, February 2015

  • Feng, Yanxiao; Zhang, Yuechuan; Ying, Cuifeng
  • Genomics, Proteomics & Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1016/j.gpb.2015.01.009

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012

  • Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
  • Journal of Computational Biology, Vol. 19, Issue 5
  • DOI: 10.1089/cmb.2012.0021

Circlator: automated circularization of genome assemblies using long sequencing reads
journal, December 2015


Comparison of Next-Generation Sequencing Systems
journal, January 2012

  • Liu, Lin; Li, Yinhu; Li, Siliang
  • Journal of Biomedicine and Biotechnology, Vol. 2012
  • DOI: 10.1155/2012/251364

A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data
journal, December 2015


Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation
journal, March 2017

  • Koren, Sergey; Walenz, Brian P.; Berlin, Konstantin
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.215087.116

Near-Complete Genome Sequence of Clostridium paradoxum Strain JW-YL-7
journal, May 2016

  • Lancaster, W. Andrew; Utturkar, Sagar M.; Poole, Farris L.
  • Genome Announcements, Vol. 4, Issue 3
  • DOI: 10.1128/genomeA.00229-16

Mind the gap; seven reasons to close fragmented genome assemblies
journal, May 2016

  • Thomma, Bart P. H. J.; Seidl, Michael F.; Shi-Kunne, Xiaoqian
  • Fungal Genetics and Biology, Vol. 90
  • DOI: 10.1016/j.fgb.2015.08.010

The advantages of SMRT sequencing
journal, June 2013

  • Roberts, Richard J.; Carneiro, Mauricio O.; Schatz, Michael C.
  • Genome Biology, Vol. 14, Issue 6
  • DOI: 10.1186/gb-2013-14-6-405

plasmidSPAdes: assembling plasmids from whole genome sequencing data
journal, July 2016


Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences
journal, June 2014


De Novo Assembly of the Streptomyces sp. Strain Mg1 Genome Using PacBio Single-Molecule Sequencing
journal, June 2013


PacBio Sequencing and Its Applications
journal, October 2015


GAGE-B: an evaluation of genome assemblers for bacterial organisms
journal, May 2013


One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly
journal, February 2015


Draft Genome Sequences of Escherichia coli Strains Isolated from Septic Patients
journal, November 2014


Genome sequencing in microfabricated high-density picolitre reactors
journal, July 2005

  • Margulies, Marcel; Egholm, Michael; Altman, William E.
  • Nature, Vol. 437, Issue 7057, p. 376-380
  • DOI: 10.1038/nature03959

ABySS: A parallel assembler for short read sequence data
journal, February 2009


Gepard: a rapid and sensitive tool for creating dotplots on genome scale
journal, February 2007


Draft Genome Sequences of Bacillus anthracis Strains Stored for Several Decades in Japan
journal, June 2015


Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement
journal, November 2014


The Value of Complete Microbial Genome Sequencing (You Get What You Pay For)
journal, December 2002


Draft Genome Sequence of the Lignin-Degrading Burkholderia sp. Strain LIG30, Isolated from Wet Tropical Forest Soil
journal, May 2014


GAGE: A critical evaluation of genome assemblies and assembly algorithms
journal, January 2012

  • Salzberg, S. L.; Phillippy, A. M.; Zimin, A.
  • Genome Research, Vol. 22, Issue 3
  • DOI: 10.1101/gr.131383.111

Draft Genome Sequence of a Natural Root Isolate, Bacillus subtilis UD1022, a Potential Plant Growth-Promoting Biocontrol Agent
journal, July 2015

  • Bishnoi, Usha; Polson, Shawn W.; Sherrier, D. Janine
  • Genome Announcements, Vol. 3, Issue 4
  • DOI: 10.1128/genomeA.00696-15

Complete Genome Sequence of Highly Adherent Pseudomonas aeruginosa Small-Colony Variant SCV20265
journal, January 2014


    Works referencing / citing this record:

    Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains
    journal, January 2018


    Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains
    journal, January 2018