Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

Journal Article · · Frontiers in Microbiology
 [1];  [2];  [3];  [4]
  1. Univ. of Tennessee, Knoxville, TN (United States); Purdue Univ., West Lafayette, IN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  4. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); BioEnergy Science Center, Oak Ridge, TN (United States)
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1376350
Journal Information:
Frontiers in Microbiology, Journal Name: Frontiers in Microbiology Vol. 8; ISSN 1664-302X
Publisher:
Frontiers Research FoundationCopyright Statement
Country of Publication:
United States
Language:
English

References (99)

Identification of putative microRNAs in the complete genome of Mycobacterium avium and their possible interaction with human transcripts journal October 2021
Mind the gap; seven reasons to close fragmented genome assemblies journal May 2016
Nanopore-based Fourth-generation DNA Sequencing Technology journal February 2015
PacBio Sequencing and Its Applications journal October 2015
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly journal February 2015
A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes journal September 2011
Ten years of next-generation sequencing technology journal September 2014
Genome sequencing in microfabricated high-density picolitre reactors journal July 2005
A hybrid approach for the automated finishing of bacterial genomes journal July 2012
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data journal May 2013
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs journal June 2012
Acapsular Staphylococcus aureus with a non-functional agr regains capsule expression after passage through the bloodstream in a bacteremia mouse model journal August 2020
Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies journal April 2015
Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens journal June 2016
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology journal June 2010
QUAST: quality assessment tool for genome assemblies journal February 2013
Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences journal June 2014
REBASE—a database for DNA restriction and modification: enzymes, genes and genomes journal November 2014
Clostridium paradoxum DSM 7308T contains multiple 16S rRNA genes with heterogeneous intervening sequences journal August 1996
ABySS: A parallel assembler for short read sequence data journal February 2009
GAGE: A critical evaluation of genome assemblies and assembly algorithms journal January 2012
HINGE: long-read assembly achieves optimal repeat resolution journal March 2017
Genome Project Standards in a New Era of Sequencing journal October 2009
Near-Complete Genome Sequence of Clostridium paradoxum Strain JW-YL-7 journal May 2016
Complete Genome Sequences of Eight Helicobacter pylori Strains with Different Virulence Factor Genotypes and Methylation Profiles, Isolated from Patients with Diverse Gastrointestinal Diseases on Okinawa Island, Japan, Determined Using PacBio Single-Molecule Real-Time Technology journal March 2014
Genome Sequence of Halomonas sp. Strain KO116, an Ionic Liquid-Tolerant Marine Bacterium Isolated from a Lignin-Enriched Seawater Microcosm journal May 2015
Draft Genome Sequence of Erwinia tracheiphila , an Economically Important Bacterial Pathogen of Cucurbits journal June 2015
De Novo Assembly of the Streptomyces sp. Strain Mg1 Genome Using PacBio Single-Molecule Sequencing journal June 2013
Draft Genome Sequences of Bacillus anthracis Strains Stored for Several Decades in Japan journal June 2015
Draft Genome Sequence of the Lignin-Degrading Burkholderia sp. Strain LIG30, Isolated from Wet Tropical Forest Soil journal May 2014
Draft Genome Sequence of a Natural Root Isolate, Bacillus subtilis UD1022, a Potential Plant Growth-Promoting Biocontrol Agent journal July 2015
Improved Draft Genome Sequence of Clostridium pasteurianum Strain ATCC 6013 (DSM 525) Using a Hybrid Next-Generation Sequencing Approach journal July 2014
First Complete Genome Sequence of Clostridium sporogenes DSM 795 T , a Nontoxigenic Surrogate for Clostridium botulinum , Determined Using PacBio Single-Molecule Real-Time Technology journal July 2015
Improved Complete Genome Sequence of the Extremely Radioresistant Bacterium Deinococcus radiodurans R1 Obtained Using PacBio Single-Molecule Sequencing journal September 2016
Near-Complete Genome Sequence of the Cellulolytic Bacterium Bacteroides ( Pseudobacteroides ) cellulosolvens ATCC 35603 journal September 2015
Complete Genome Sequence of Pelosinus fermentans JBW45, a Member of a Remarkably Competitive Group of Negativicutes in the Firmicutes Phylum journal September 2015
Complete Genome Sequence of the Sugar Cane Endophyte Pseudomonas aurantiaca PB-St2, a Disease-Suppressive Bacterium with Antifungal Activity toward the Plant Pathogen Colletotrichum falcatum journal January 2014
Complete Genome Sequence of Highly Adherent Pseudomonas aeruginosa Small-Colony Variant SCV20265 journal January 2014
Draft Genome Sequences of Escherichia coli Strains Isolated from Septic Patients journal November 2014
The Value of Complete Microbial Genome Sequencing (You Get What You Pay For) journal December 2002
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers journal January 2012
Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia journal January 2014
PerPlot & PerScan: tools for analysis of DNA curvature-related periodicity in genomic nucleotide sequences journal November 2011
A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data journal December 2015
Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology journal November 2012
Investigating the Interplay between Nucleoid-Associated Proteins, DNA Curvature, and CRISPR Elements Using Comparative Genomics journal March 2014
Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches journal December 2015
A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data preprint January 2014
Phylogenetic Methods for Genome-Wide Association Studies in Bacteria book January 2021
Morphological and genetic characterization of the first Isospora species (I. lugensae n. sp.) from a Kerguelen petrel (Lugensa brevirostris) journal January 2021
Real-Time DNA Sequencing from Single Polymerase Molecules book January 2010
Next generation sequencing technology: Advances and applications journal October 2014
Erratum: Repetitive DNA and next-generation sequencing: computational challenges and solutions journal January 2012
Sequence assembly demystified journal January 2013
Mutations in virus-derived small RNAs journal June 2020
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
Gepard: a rapid and sensitive tool for creating dotplots on genome scale journal February 2007
Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data journal April 2012
GAGE-B: an evaluation of genome assemblers for bacterial organisms journal May 2013
A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data journal October 2014
plasmidSPAdes: assembling plasmids from whole genome sequencing data journal July 2016
Mfold web server for nucleic acid folding and hybridization prediction journal July 2003
REBASE—a database for DNA restriction and modification: enzymes, genes and genomes journal October 2009
Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data journal August 2016
Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation journal March 2017
The Value of Complete Microbial Genome Sequencing (You Get What You Pay For) journal December 2002
Complete Closed Genome Sequences of Three Bibersteinia trehalosi Nasopharyngeal Isolates from Cattle with Shipping Fever journal January 2014
Near-Complete Genome Sequence of Clostridium paradoxum Strain JW-YL-7 journal May 2016
Complete Genome Sequences of Eight Helicobacter pylori Strains with Different Virulence Factor Genotypes and Methylation Profiles, Isolated from Patients with Diverse Gastrointestinal Diseases on Okinawa Island, Japan, Determined Using PacBio Single-Molecule Real-Time Technology journal March 2014
Genome Sequence of Halomonas sp. Strain KO116, an Ionic Liquid-Tolerant Marine Bacterium Isolated from a Lignin-Enriched Seawater Microcosm journal May 2015
Draft Genome Sequence of Erwinia tracheiphila , an Economically Important Bacterial Pathogen of Cucurbits journal June 2015
De Novo Assembly of the Streptomyces sp. Strain Mg1 Genome Using PacBio Single-Molecule Sequencing journal June 2013
Draft Genome Sequences of Bacillus anthracis Strains Stored for Several Decades in Japan journal June 2015
Draft Genome Sequence of the Lignin-Degrading Burkholderia sp. Strain LIG30, Isolated from Wet Tropical Forest Soil journal May 2014
Draft Genome Sequence of a Natural Root Isolate, Bacillus subtilis UD1022, a Potential Plant Growth-Promoting Biocontrol Agent journal July 2015
Complete Genome Sequence of Bacillus thuringiensis Serovar Tolworthi Strain Pasteur Institute Standard journal July 2015
Improved Draft Genome Sequence of Clostridium pasteurianum Strain ATCC 6013 (DSM 525) Using a Hybrid Next-Generation Sequencing Approach journal July 2014
First Complete Genome Sequence of Clostridium sporogenes DSM 795 T , a Nontoxigenic Surrogate for Clostridium botulinum , Determined Using PacBio Single-Molecule Real-Time Technology journal July 2015
Complete Genome Sequence of Pelosinus sp. Strain UFO1 Assembled Using Single-Molecule Real-Time DNA Sequencing Technology journal September 2014
Improved Complete Genome Sequence of the Extremely Radioresistant Bacterium Deinococcus radiodurans R1 Obtained Using PacBio Single-Molecule Sequencing journal September 2016
Near-Complete Genome Sequence of the Cellulolytic Bacterium Bacteroides ( Pseudobacteroides ) cellulosolvens ATCC 35603 journal September 2015
Application of Long Sequence Reads To Improve Genomes for Clostridium thermocellum AD2, Clostridium thermocellum LQRI, and Pelosinus fermentans R7 journal September 2016
Complete Genome Sequence of Pelosinus fermentans JBW45, a Member of a Remarkably Competitive Group of Negativicutes in the Firmicutes Phylum journal September 2015
Complete Genome Sequence of the Sugar Cane Endophyte Pseudomonas aurantiaca PB-St2, a Disease-Suppressive Bacterium with Antifungal Activity toward the Plant Pathogen Colletotrichum falcatum journal January 2014
Complete Genome Sequence of Highly Adherent Pseudomonas aeruginosa Small-Colony Variant SCV20265 journal January 2014
Draft Genome Sequences of Escherichia coli Strains Isolated from Septic Patients journal November 2014
Comparison of Next-Generation Sequencing Systems journal January 2012
The advantages of SMRT sequencing journal June 2013
The advantages of SMRT sequencing journal July 2013
Reducing assembly complexity of microbial genomes with single-molecule sequencing journal January 2013
Fine-scale population structure and evidence for local adaptation in Australian giant black tiger shrimp (Penaeus monodon) using SNP analysis journal September 2020
Circlator: automated circularization of genome assemblies using long sequencing reads journal December 2015
Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement journal November 2014
Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data. text January 2016
Draft Genome Sequence of Erwinia tracheiphila, an Economically Important Bacterial Pathogen of Cucurbits text January 2015
Disk Compression of k-mer Sets text January 2020
Charting the genomic landscape of seed-free plants text January 2021
Circlator: automated circularization of genome assemblies using long sequencing reads collection January 2015

Cited By (9)

Polyglutamine Repeats in Viruses journal September 2018
Comparative genomic analysis of monosporidial and monoteliosporic cultures for unraveling the complexity of molecular pathogenesis of Tilletia indica pathogen of wheat journal June 2019
Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats text January 2018
Improved genome of Agrobacterium radiobacter type strain provides new taxonomic insight into Agrobacterium genomospecies 4 journal January 2019
Graph analysis of fragmented long-read bacterial genome assemblies journal March 2019
Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats journal August 2018
Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats posted_content April 2018
Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains journal January 2018
Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats text January 2018