skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation

Journal Article · · PeerJ
DOI:https://doi.org/10.7717/peerj.11447· OSTI ID:1813767
 [1];  [1];  [1];  [2];  [1];  [1];  [3];  [4];  [5];  [1]
  1. The Ohio State Univ., Columbus, OH (United States); The Ohio State Univ., Columbus, OH (United States). Center of Microbiome Science
  2. The Ohio State Univ., Columbus, OH (United States); The Ohio State Univ., Columbus, OH (United States). Center of Microbiome Science; The Ohio State Univ., Columbus, OH (United States). Byrd Polar and Climate Research Center
  3. Viromica Consulting, Santiago (Chile)
  4. The Ohio State Univ., Columbus, OH (United States); The Ohio State Univ., Columbus, OH (United States). Center of Microbiome Science; The Ohio State Univ., Columbus, OH (United States). Infectious Diseases Inst.
  5. USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ~5% for virome and ~75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. Furthermore, for viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ~95% (whole genomes) down to ~80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF); Gordon and Betty Moore Foundation
Grant/Contract Number:
AC02-05CH11231; OCE1829831; ABI1758974; 3790
OSTI ID:
1813767
Journal Information:
PeerJ, Vol. 9; ISSN 2167-8359
Publisher:
PeerJ Inc.Copyright Statement
Country of Publication:
United States
Language:
English

References (99)

Bacterial photosynthesis genes in a virus journal August 2003
DRAM for distilling microbial metabolism to automate the curation of microbiome function journal August 2020
Sulfur Oxidation Genes in Diverse Deep-Sea Viruses journal May 2014
Comparative genomics of marine cyanomyoviruses reveals the widespread occurrence of Synechococcus host genes localized to a hyperplastic region: implications for mechanisms of cyanophage evolution journal September 2009
Host-linked soil viral ecology along a permafrost thaw gradient journal July 2018
Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome journal August 2014
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation journal January 2020
Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method: Linker amplification for ultra-low DNA samples journal June 2012
A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core journal July 2018
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
PHAST, PHASTER and PHASTEST: Tools for finding prophage in bacterial genomes journal September 2017
Metagenomic analysis of the viral community in Namib Desert hypoliths: Metaviromics of Namib hypoliths journal July 2014
Marine DNA Viral Macro- and Microdiversity from Pole to Pole journal May 2019
Viral tagging reveals discrete populations in Synechococcus viral genome sequence space journal July 2014
Genomic variation landscape of the human gut microbiome journal December 2012
Computational approaches to predict bacteriophage–host relationships journal December 2015
VirSorter: mining viral signal from microbial genomic data journal January 2015
Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments: Comparative genomics of T4-like myoviruses journal November 2010
Exploring the Vast Diversity of Marine Viruses journal June 2007
Genome-wide expression dynamics of a marine virus and host reveal features of co-evolution journal September 2007
Second Internal Thoracic Artery Versus Radial Artery in Coronary Artery Bypass Grafting: A Long-Term, Propensity Score–Matched Follow-Up Study journal September 2011
Photosynthesis genes in marine viruses yield proteins during host infection journal October 2005
Mining, analyzing, and integrating viral signals from metagenomic data journal March 2019
Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics: Viral community concentration and purification journal July 2012
A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment journal January 2009
Phage-specific metabolic reprogramming of virocells journal January 2020
Uncovering Earth’s virome journal August 2016
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform journal July 2002
Single-Stranded DNA Viruses in Antarctic Cryoconite Holes journal November 2019
MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets journal September 2016
RNAMotif, an RNA secondary structure definition and search algorithm journal November 2001
Gut DNA viromes of Malawian twins discordant for severe acute malnutrition journal September 2015
Coccolithovirus facilitation of carbon export in the North Atlantic journal March 2018
Enteric Virome and Bacterial Microbiota in Children With Ulcerative Colitis and Crohn Disease journal January 2019
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen journal January 2001
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs journal July 2017
Viral elements and their potential influence on microbial processes along the permanently stratified Cariaco Basin redoxcline journal August 2020
ViPTree: the viral proteomic tree server journal March 2017
Efficient Phage-Mediated Pigment Biosynthesis in Oceanic Cyanobacteria journal March 2008
The automatic annotation of bacterial genomes journal March 2012
The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut journal November 2020
Transcription of a 'photosynthetic' T4-type phage during infection of a marine cyanobacterium journal May 2006
SWISS-MODEL: homology modelling of protein structures and complexes journal May 2018
Interactive Tree Of Life (iTOL) v4: recent updates and new developments journal April 2019
Phages rarely encode antibiotic resistance genes: a cautionary tale for virome analyses journal June 2016
Plankton networks driving carbon export in the oligotrophic ocean journal February 2016
Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans journal July 2017
RDP4: Detection and analysis of recombination patterns in virus genomes journal March 2015
Expanding the Marine Virosphere Using Metagenomics journal December 2013
Unveiling the role and life strategies of viruses from the surface to the dark ocean journal September 2017
A network-based integrated framework for predicting virus–prokaryote interactions journal June 2020
T4-type viruses: Important impacts on shaping bacterial community along a chronosequence of 2000-year old paddy soils journal January 2019
Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences journal December 2013
Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity journal January 2017
Summary for Policymakers book March 2014
I-TASSER server: new development for protein structure and function predictions journal April 2015
Detecting contamination in viromes using ViromeQC journal November 2019
Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks journal May 2019
Viral photosynthetic reaction center genes and transcripts in the marine environment journal August 2007
Discovery of several novel, widespread, and ecologically distinct marine Thaumarchaeota viruses that encode amoC nitrification genes journal October 2018
Phage puppet masters of the marine microbial realm journal June 2018
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer journal November 2016
Potential virus-mediated nitrogen cycling in oxygen-depleted oceanic waters journal November 2020
Expansion of known ssRNA phage genomes: From tens to over a thousand journal February 2020
Viral species richness and composition in young children with loose or watery stool in Ethiopia journal January 2019
Counts and sequences, observations that continue to change our understanding of viruses in nature journal March 2015
Early life dynamics of the human gut virome and bacterial microbiome in infants journal September 2015
Adaptation to sub-optimal hosts is a driver of viral diversification in the ocean journal November 2018
Patterns and ecological drivers of ocean viral communities journal May 2015
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes journal April 2019
Diversity and Ecology of Viruses in Hyperarid Desert Soils journal November 2015
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses journal September 2016
Modular approach to customise sample preparation procedures for viral metagenomics: a reproducible protocol for virome analysis journal November 2015
Easyfig: a genome comparison visualizer journal January 2011
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation journal August 2019
Whole-Virome Analysis Sheds Light on Viral Dark Matter in Inflammatory Bowel Disease journal December 2019
Metabolic reprogramming by viruses in the sunlit and dark ocean journal January 2013
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies journal November 2014
Prevalence and Evolution of Core Photosystem II Genes in Marine Cyanobacterial Viruses and Their Hosts journal July 2006
Diversities and potential biogeochemical impacts of mangrove soil viruses journal April 2019
Virus-host relationships of marine single-celled eukaryotes resolved from metatranscriptomics journal June 2017
Namib Desert Soil Microbial Community Diversity, Assembly, and Function Along a Natural Xeric Gradient journal June 2017
Functional metagenomic profiling of nine biomes journal March 2008
Disease-Specific Alterations in the Enteric Virome in Inflammatory Bowel Disease journal January 2015
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
HostPhinder: A Phage Host Prediction Tool journal May 2016
Multiple displacement amplification compromises quantitative analysis of metagenomes journal November 2010
Identifying viruses from metagenomic data using deep learning journal January 2020
MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins journal August 2018
Virocell Metabolism: Metabolic Innovations During Host–Virus Interactions in the Ocean journal October 2016
VIRIDIC—A Novel Tool to Calculate the Intergenomic Similarities of Prokaryote-Infecting Viruses journal November 2020
Double-stranded DNA virioplankton dynamics and reproductive strategies in the oligotrophic open ocean water column journal February 2020
Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation journal December 2019
Prokaryotic Population Dynamics and Viral Predation in a Marine Succession Experiment Using Metagenomics journal December 2019
Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake journal January 2007
Interaction dynamics and virus–host range for estuarine actinophages captured by epicPCR journal February 2021
Marine viruses and their biogeochemical and ecological effects journal June 1999
New and continuing developments at PROSITE journal November 2012
Cyanophage-encoded lipid desaturases: oceanic distribution, diversity and function journal October 2017