DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families

Journal Article · · Bioinformatics

Abstract Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. Availability and implementation The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. Supplementary information Supplementary data are available at Bioinformatics online.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AC02-05CH11231; AC02-05CH11231
OSTI ID:
1810588
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Journal Issue: 13 Vol. 37; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (24)

Virus classification – where do you draw the line? journal July 2018
Uncovering Earth’s virome journal August 2016
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses journal September 2016
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
Marine viruses — major players in the global ecosystem journal October 2007
Giant virus diversity and host interactions through global metagenomics journal January 2020
Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks journal May 2019
Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus journal March 2019
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs journal July 2017
VICTOR: genome-based phylogeny and classification of prokaryotic viruses journal July 2017
ViralZone: a knowledge resource to understand virus diversity journal October 2010
Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences journal November 2016
IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes journal November 2018
HMMER web server: 2018 update journal June 2018
IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes journal October 2018
Evaluation of the genomic diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps towards a unified taxonomy journal September 2018
Global Organization and Proposed Megataxonomy of the Virus World journal March 2020
Expression of animal virus genomes. journal January 1971
Diversity, evolution, and classification of virophages uncovered through global metagenomics journal December 2019
Viral taxonomy derived from evolutionary genome relationships journal August 2019
Overview of Virus Metagenomic Classification Methods and Their Biological Applications journal April 2018
Linking Virus Genomes with Host Taxonomy journal March 2016
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes journal July 2015
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria journal January 2017

Related Subjects