Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation

Journal Article · · PeerJ
DOI:https://doi.org/10.7717/peerj.11447· OSTI ID:1813767
 [1];  [1];  [1];  [2];  [1];  [1];  [3];  [4];  [5];  [1]
  1. The Ohio State Univ., Columbus, OH (United States); The Ohio State Univ., Columbus, OH (United States). Center of Microbiome Science
  2. The Ohio State Univ., Columbus, OH (United States); The Ohio State Univ., Columbus, OH (United States). Center of Microbiome Science; The Ohio State Univ., Columbus, OH (United States). Byrd Polar and Climate Research Center
  3. Viromica Consulting, Santiago (Chile)
  4. The Ohio State Univ., Columbus, OH (United States); The Ohio State Univ., Columbus, OH (United States). Center of Microbiome Science; The Ohio State Univ., Columbus, OH (United States). Infectious Diseases Inst.
  5. USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ~5% for virome and ~75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. Furthermore, for viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ~95% (whole genomes) down to ~80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Gordon and Betty Moore Foundation; National Science Foundation (NSF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1813767
Journal Information:
PeerJ, Journal Name: PeerJ Vol. 9; ISSN 2167-8359
Publisher:
PeerJ Inc.Copyright Statement
Country of Publication:
United States
Language:
English

References (104)

Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen journal January 2001
Namib Desert Soil Microbial Community Diversity, Assembly, and Function Along a Natural Xeric Gradient journal June 2017
Counts and sequences, observations that continue to change our understanding of viruses in nature journal March 2015
Identifying viruses from metagenomic data using deep learning journal January 2020
Disease-Specific Alterations in the Enteric Virome in Inflammatory Bowel Disease journal January 2015
Marine DNA Viral Macro- and Microdiversity from Pole to Pole journal May 2019
Whole-Virome Analysis Sheds Light on Viral Dark Matter in Inflammatory Bowel Disease journal December 2019
The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut journal November 2020
Efficient Phage-Mediated Pigment Biosynthesis in Oceanic Cyanobacteria journal March 2008
A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core journal July 2018
T4-type viruses: Important impacts on shaping bacterial community along a chronosequence of 2000-year old paddy soils journal January 2019
Virocell Metabolism: Metabolic Innovations During Host–Virus Interactions in the Ocean journal October 2016
Summary for Policymakers book March 2014
Marine viruses and their biogeochemical and ecological effects journal June 1999
Bacterial photosynthesis genes in a virus journal August 2003
Viral photosynthetic reaction center genes and transcripts in the marine environment journal August 2007
Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome journal August 2014
Phages rarely encode antibiotic resistance genes: a cautionary tale for virome analyses journal June 2016
Cyanophage-encoded lipid desaturases: oceanic distribution, diversity and function journal October 2017
Photosynthesis genes in marine viruses yield proteins during host infection journal October 2005
Genome-wide expression dynamics of a marine virus and host reveal features of co-evolution journal September 2007
Functional metagenomic profiling of nine biomes journal March 2008
Genomic variation landscape of the human gut microbiome journal December 2012
Viral tagging reveals discrete populations in Synechococcus viral genome sequence space journal July 2014
Plankton networks driving carbon export in the oligotrophic ocean journal February 2016
Uncovering Earth’s virome journal August 2016
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses journal September 2016
Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans journal July 2017
Virus-host relationships of marine single-celled eukaryotes resolved from metatranscriptomics journal June 2017
Early life dynamics of the human gut virome and bacterial microbiome in infants journal September 2015
Multiple displacement amplification compromises quantitative analysis of metagenomes journal November 2010
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
Discovery of several novel, widespread, and ecologically distinct marine Thaumarchaeota viruses that encode amoC nitrification genes journal October 2018
Phage-specific metabolic reprogramming of virocells journal January 2020
Viral elements and their potential influence on microbial processes along the permanently stratified Cariaco Basin redoxcline journal August 2020
Potential virus-mediated nitrogen cycling in oxygen-depleted oceanic waters journal November 2020
Double-stranded DNA virioplankton dynamics and reproductive strategies in the oligotrophic open ocean water column journal February 2020
Adaptation to sub-optimal hosts is a driver of viral diversification in the ocean journal November 2018
Coccolithovirus facilitation of carbon export in the North Atlantic journal March 2018
Phage puppet masters of the marine microbial realm journal June 2018
Host-linked soil viral ecology along a permafrost thaw gradient journal July 2018
Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation journal December 2019
Interaction dynamics and virus–host range for estuarine actinophages captured by epicPCR journal February 2021
Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks journal May 2019
Detecting contamination in viromes using ViromeQC journal November 2019
Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis journal November 2015
Gut DNA viromes of Malawian twins discordant for severe acute malnutrition journal September 2015
The automatic annotation of bacterial genomes journal March 2012
PHAST, PHASTER and PHASTEST: Tools for finding prophage in bacterial genomes journal September 2017
Easyfig: a genome comparison visualizer journal January 2011
ViPTree: the viral proteomic tree server journal March 2017
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs journal July 2017
Computational approaches to predict bacteriophage–host relationships journal December 2015
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies journal November 2014
RNAMotif, an RNA secondary structure definition and search algorithm journal November 2001
DRAM for distilling microbial metabolism to automate the curation of microbiome function journal August 2020
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform journal July 2002
New and continuing developments at PROSITE journal November 2012
I-TASSER server: new development for protein structure and function predictions journal April 2015
SWISS-MODEL: homology modelling of protein structures and complexes journal May 2018
Interactive Tree Of Life (iTOL) v4: recent updates and new developments journal April 2019
A network-based integrated framework for predicting virus–prokaryote interactions journal June 2020
RDP4: Detection and analysis of recombination patterns in virus genomes journal March 2015
Enteric Virome and Bacterial Microbiota in Children With Ulcerative Colitis and Crohn Disease journal January 2019
Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences journal December 2013
Metagenomic analysis of the viral community in Namib Desert hypoliths: Metaviromics of Namib hypoliths journal July 2014
Transcription of a 'photosynthetic' T4-type phage during infection of a marine cyanobacterium journal May 2006
Comparative genomics of marine cyanomyoviruses reveals the widespread occurrence of Synechococcus host genes localized to a hyperplastic region: implications for mechanisms of cyanophage evolution journal September 2009
Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments: Comparative genomics of T4-like myoviruses journal November 2010
Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method: Linker amplification for ultra-low DNA samples journal June 2012
Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics: Viral community concentration and purification journal July 2012
Unveiling the role and life strategies of viruses from the surface to the dark ocean journal September 2017
Expansion of known ssRNA phage genomes: From tens to over a thousand journal February 2020
Sulfur Oxidation Genes in Diverse Deep-Sea Viruses journal May 2014
Patterns and ecological drivers of ocean viral communities journal May 2015
Diversity and Ecology of Viruses in Hyperarid Desert Soils journal November 2015
Virus Genomes from Deep Sea Sediments Expand the Ocean Megavirome and Support Independent Origins of Viral Gigantism journal March 2019
Second Internal Thoracic Artery Versus Radial Artery in Coronary Artery Bypass Grafting: A Long-Term, Propensity Score–Matched Follow-Up Study journal September 2011
A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment journal January 2009
Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake journal January 2007
Metabolic reprogramming by viruses in the sunlit and dark ocean journal January 2013
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer journal November 2016
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation journal January 2020
Viral species richness and composition in young children with loose or watery stool in Ethiopia journal January 2019
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation journal August 2019
Mining, analyzing, and integrating viral signals from metagenomic data journal March 2019
Diversities and potential biogeochemical impacts of mangrove soil viruses journal April 2019
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
Prevalence and Evolution of Core Photosystem II Genes in Marine Cyanobacterial Viruses and Their Hosts journal July 2006
Expanding the Marine Virosphere Using Metagenomics journal December 2013
MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets journal September 2016
MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins journal August 2018
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes journal April 2019
Prokaryotic Population Dynamics and Viral Predation in a Marine Succession Experiment Using Metagenomics journal December 2019
Single-Stranded DNA Viruses in Antarctic Cryoconite Holes journal November 2019
VIRIDIC—A Novel Tool to Calculate the Intergenomic Similarities of Prokaryote-Infecting Viruses journal November 2020
HostPhinder: A Phage Host Prediction Tool journal May 2016
Exploring the Vast Diversity of Marine Viruses journal June 2007
Viral species richness and composition in young children with loose or watery stool in Ethiopia collection January 2019
Mining, analyzing, and integrating viral signals from metagenomic data collection January 2019
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation collection January 2019
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation collection January 2020
Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity journal January 2017
VirSorter: mining viral signal from microbial genomic data journal January 2015

Similar Records

$\mathrm{COBRA}$ improves the completeness and contiguity of viral genomes assembled from metagenomes
Journal Article · Mon Feb 05 19:00:00 EST 2024 · Nature Microbiology · OSTI ID:2335317

Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks
Journal Article · Sun May 05 20:00:00 EDT 2019 · Nature Biotechnology · OSTI ID:1569045