DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

Journal Article · · ISME Communications

Abstract Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-05CH11231; SC0014664
OSTI ID:
1996601
Journal Information:
ISME Communications, Journal Name: ISME Communications Journal Issue: 1 Vol. 3; ISSN 2730-6151
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (81)

Identifying viruses from metagenomic data using deep learning journal January 2020
Marine DNA Viral Macro- and Microdiversity from Pole to Pole journal May 2019
Novel Abundant Oceanic Viruses of Uncultured Marine Group II Euryarchaeota journal May 2017
Ocean viruses: Rigorously evaluating the metagenomic sample-to-sequence pipeline journal December 2012
40 Years of archaeal virology: Expanding viral diversity journal May 2015
Viruses of archaea: Structural, functional, environmental and evolutionary genomics journal January 2018
Marine viruses and their biogeochemical and ecological effects journal June 1999
Archaeal dominance in the mesopelagic zone of the Pacific Ocean journal January 2001
Enrichment and characterization of ammonia-oxidizing archaea from the open ocean: phylogeny, physiology and stable isotope fractionation journal May 2011
Unexpected and novel putative viruses in the sediments of a deep-dark permanently anoxic freshwater habitat journal May 2012
Single-cell genomics-based analysis of virus–host interactions in marine surface bacterioplankton journal April 2015
iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure journal July 2016
Major viral impact on the functioning of benthic deep-sea ecosystems journal August 2008
Plankton networks driving carbon export in the oligotrophic ocean journal February 2016
Uncovering Earth’s virome journal August 2016
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses journal September 2016
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
Minimum Information about an Uncultivated Virus Genome (MIUViG) journal December 2018
Expansion of oxygen minimum zones may reduce available habitat for tropical pelagic fishes journal December 2011
The enigmatic archaeal virosphere journal November 2017
Marine viruses — major players in the global ecosystem journal October 2007
Microbial ecology of expanding oxygen minimum zones journal May 2012
Rising to the challenge: accelerated pace of discovery transforms marine virology journal February 2015
Discovery of several novel, widespread, and ecologically distinct marine Thaumarchaeota viruses that encode amoC nitrification genes journal October 2018
Phage-specific metabolic reprogramming of virocells journal January 2020
Potential virus-mediated nitrogen cycling in oxygen-depleted oceanic waters journal November 2020
Lytic archaeal viruses infect abundant primary producers in Earth’s crust journal July 2021
Phage puppet masters of the marine microbial realm journal June 2018
Host-linked soil viral ecology along a permafrost thaw gradient journal July 2018
The LUCA and its complex virome journal July 2020
Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks journal May 2019
iVirus 2.0: Cyberinfrastructure-supported tools and data to power DNA virus ecology journal December 2021
A thaumarchaeal provirus testifies for an ancient association of tailed viruses with archaea journal January 2011
Spindle-shaped viruses infect marine ammonia-oxidizing thaumarchaea journal July 2019
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs journal July 2017
MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes journal October 2008
DRAM for distilling microbial metabolism to automate the curation of microbiome function journal August 2020
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities journal October 2020
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses journal November 2020
Pfam: the protein families database journal November 2013
A highly divergent archaeo-eukaryotic primase from the Thermococcus nautilus plasmid, pTN2 journal January 2014
KEGG: new perspectives on genomes, pathways, diseases and drugs journal November 2016
Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation journal October 2016
Database resources of the National Center for Biotechnology Information journal November 2017
IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes journal November 2018
HMMER web server: 2018 update journal June 2018
Novel Caudovirales associated with Marine Group I Thaumarchaeota assembled from metagenomes journal December 2018
Genome‐resolved viral ecology in a marine oxygen minimum zone journal November 2020
Virus-mediated archaeal hecatomb in the deep seafloor journal October 2016
Environmental vulnerability of the global ocean epipelagic plankton community interactome journal August 2021
The Microbial Engines That Drive Earth's Biogeochemical Cycles journal May 2008
Determinants of community structure in the global plankton interactome journal May 2015
The global soil community and its influence on biogeochemistry journal August 2019
PAV1, the First Virus-Like Particle Isolated from a Hyperthermophilic Euryarchaeote, "Pyrococcus abyssi" journal July 2003
Global Organization and Proposed Megataxonomy of the Virus World journal March 2020
The Double-Stranded DNA Virosphere as a Modular Hierarchical Network of Gene Sharing journal August 2016
Environmental Viral Genomes Shed New Light on Virus-Host Interactions in the Ocean journal March 2017
Oceanographic and Biological Effects of Shoaling of the Oxygen Minimum Zone journal January 2013
Planktonic Marine Archaea journal January 2019
The Wonderful World of Archaeal Viruses journal September 2013
Archaeal Viruses: Diversity, Replication, and Structure journal November 2014
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Metavir 2: new tools for viral metagenome comparison and assembled virome analysis journal January 2014
Metabolic reprogramming by viruses in the sunlit and dark ocean journal January 2013
Host-hijacking and planktonic piracy: how phages command the microbial high seas journal February 2019
The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification journal February 2018
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses journal February 2021
Marine archaea and archaeal viruses under global change journal July 2017
Diversity, taxonomy, and evolution of archaeal viruses of the class Caudoviricetes journal November 2021
Insights into Dynamics of Mobile Genetic Elements in Hyperthermophilic Environments from Five New Thermococcus Plasmids journal January 2013
Sensitivity of the carbon cycle in the Arctic to climate change journal November 2009
MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins journal August 2018
Combining genomic sequencing methods to explore viral diversity and reveal potential virus-host interactions journal April 2015
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes journal April 2019
Archaeal Viruses from High-Temperature Environments journal February 2018
VIRIDIC—A Novel Tool to Calculate the Intergenomic Similarities of Prokaryote-Infecting Viruses journal November 2020
Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation journal January 2021
Putative archaeal viruses from the mesopelagic ocean journal January 2017
Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity journal January 2017
VirSorter: mining viral signal from microbial genomic data journal January 2015

Similar Records

Diversity, taxonomy, and evolution of archaeal viruses of the class Caudoviricetes
Journal Article · 2021 · PLoS Biology (Online) · OSTI ID:1834437

Archaeal viruses from Yellowstone’s high temperature environments
Journal Article · 2004 · Geothermal Biology and Geochemistry in Yellowstone Park, W. Inskeep, T. McDermott, Editors, Montana · OSTI ID:912330

Conservation of the C-type lectin fold for accommodating massive sequence variation in archaeal diversity-generating retroelements
Journal Article · 2016 · BMC Structural Biology (Online) · OSTI ID:1351348

Related Subjects