Variant profiling of evolving prokaryotic populations
- Univ. of Vienna (Austria). Division of Computational Systems Biology. Dept. of Microbiology and Ecosystems Science
- Univ. of Vienna (Austria). Division of Microbial Ecology. Dept. of Microbiology and Ecosystems Science
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available athttps://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible athttp://galaxy.csb.univie.ac.at.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1628929
- Journal Information:
- PeerJ, Vol. 5; ISSN 2167-8359
- Publisher:
- PeerJ Inc.Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Within-Host Genomic Diversity of Candida albicans in Healthy Carriers
|
journal | February 2019 |
Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes
|
journal | August 2019 |
Antibiotic Resistance Increases Evolvability and Maximizes Opportunities Across Fitness Landscapes
|
posted_content | September 2019 |
Genome-wide detection of conservative site-specific recombination in bacteria
|
journal | April 2018 |
Genome-wide detection of conservative site-specific recombination in bacteria
|
text | January 2018 |
Similar Records
Prediction of bacterial E3 ubiquitin ligase effectors using reduced amino acid peptide fingerprinting
Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes