skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Variant profiling of evolving prokaryotic populations

Journal Article · · PeerJ
DOI:https://doi.org/10.7717/peerj.2997· OSTI ID:1628929
 [1];  [2];  [3];  [2];  [2];  [2]
  1. Univ. of Vienna (Austria). Division of Computational Systems Biology. Dept. of Microbiology and Ecosystems Science
  2. Univ. of Vienna (Austria). Division of Microbial Ecology. Dept. of Microbiology and Ecosystems Science
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)

Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available athttps://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible athttp://galaxy.csb.univie.ac.at.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1628929
Journal Information:
PeerJ, Vol. 5; ISSN 2167-8359
Publisher:
PeerJ Inc.Copyright Statement
Country of Publication:
United States
Language:
English

References (46)

A framework for variation discovery and genotyping using next-generation DNA sequencing data journal April 2011
Assemblathon 1: A competitive assessment of de novo short read assembly methods journal September 2011
A case of adaptation through a mutation in a tandem duplication during experimental evolution in Escherichia coli journal January 2013
Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq journal January 2014
Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data journal September 2016
Toward better understanding of artifacts in variant calling from high-coverage samples journal June 2014
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy journal November 2011
DELLY: structural variant discovery by integrated paired-end and split-read analysis journal September 2012
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems journal January 2011
Quality control and preprocessing of metagenomic datasets journal January 2011
Mutascope: sensitive detection of somatic mutations from deep amplicon sequencing journal May 2013
ALF—A Simulation Framework for Genome Evolution journal December 2011
De novo assembly and genotyping of variants using colored de Bruijn graphs journal January 2012
The Dynamics and Time Scale of Ongoing Genomic Erosion in Symbiotic Bacteria journal January 2009
Fast and accurate short read alignment with Burrows-Wheeler transform journal May 2009
Genotype and SNP calling from next-generation sequencing data journal May 2011
BreakDancer: an algorithm for high-resolution mapping of genomic structural variation journal August 2009
A survey of error-correction methods for next-generation sequencing journal April 2012
Trimmomatic: a flexible trimmer for Illumina sequence data journal April 2014
Variation in the Ratio of Nucleotide Substitution and Indel Rates across Genomes in Mammals and Bacteria journal March 2009
A human gut microbial gene catalogue established by metagenomic sequencing journal March 2010
Whole-Genome Sequencing and Social-Network Analysis of a Tuberculosis Outbreak journal February 2011
A Guide for the Design of Evolve and Resequencing Studies journal November 2013
Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions journal January 2014
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data journal July 2010
LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets journal October 2012
Genomic analysis of a key innovation in an experimental Escherichia coli population journal September 2012
Detection of Mixed Infection from Bacterial Whole Genome Sequence Data Allows Assessment of Its Role in Clostridium difficile Transmission journal May 2013
pIRS: Profile-based Illumina pair-end reads simulator journal April 2012
Negative Epistasis Between Beneficial Mutations in an Evolving Bacterial Population journal June 2011
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world journal October 2008
A draft genome of Yersinia pestis from victims of the Black Death journal October 2011
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update journal May 2016
Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations journal May 2015
Genome dynamics during experimental evolution journal October 2013
NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data journal February 2012
The Sequence Alignment/Map format and SAMtools journal June 2009
Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads journal June 2009
VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing journal February 2012
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w journal April 2012
Illuminating the Evolutionary History of Chlamydiae journal April 2004
Targeted introduction of heritable point mutations into the plant mitochondrial genome journal March 2022
Targeted A-to-G base editing of chloroplast DNA in plants journal December 2022
CXCR4 signaling directs Igk recombination and the molecular mechanisms of late B lymphopoiesis journal September 2019
Rapid Whole-Genome Sequencing for Investigation of a Neonatal MRSA Outbreak journal June 2012
Suitability of different mapping algorithms for genome-wide polymorphism scans with Pool-Seq data journal May 2016

Cited By (5)

Within-Host Genomic Diversity of Candida albicans in Healthy Carriers journal February 2019
Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes journal August 2019
Antibiotic Resistance Increases Evolvability and Maximizes Opportunities Across Fitness Landscapes posted_content September 2019
Genome-wide detection of conservative site-specific recombination in bacteria journal April 2018
Genome-wide detection of conservative site-specific recombination in bacteria text January 2018