skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families

Journal Article · · Bioinformatics

Abstract Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. Availability and implementation The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. Supplementary information Supplementary data are available at Bioinformatics online.

Research Organization:
USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); Ministerio de Ciencia e Innovación (MCI); Agencia Estatal de investigación (AEI); European Regional Development Funds (ERDF)
Grant/Contract Number:
AC02–05CH11231; no. DE-AC02–05CH11231; AC02-05CH11231
OSTI ID:
1810588
Alternate ID(s):
OSTI ID: 1778483; OSTI ID: 1904109
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Vol. 37 Journal Issue: 13; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (24)

Uncovering Earth’s virome journal August 2016
VICTOR: genome-based phylogeny and classification of prokaryotic viruses journal July 2017
Marine viruses — major players in the global ecosystem journal October 2007
Viral taxonomy derived from evolutionary genome relationships journal August 2019
Virus classification – where do you draw the line? journal July 2018
ViralZone: a knowledge resource to understand virus diversity journal October 2010
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses journal September 2016
IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes journal October 2018
Diversity, evolution, and classification of virophages uncovered through global metagenomics journal December 2019
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria journal January 2017
Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks journal May 2019
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes journal July 2015
HMMER web server: 2018 update journal June 2018
IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes journal November 2018
Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus journal March 2019
Giant virus diversity and host interactions through global metagenomics journal January 2020
Expression of animal virus genomes. journal January 1971
Overview of Virus Metagenomic Classification Methods and Their Biological Applications journal April 2018
Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences journal November 2016
Evaluation of the genomic diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps towards a unified taxonomy journal September 2018
Global Organization and Proposed Megataxonomy of the Virus World journal March 2020
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs journal July 2017
Linking Virus Genomes with Host Taxonomy journal March 2016

Similar Records

efam: an e xpanded, metaproteome-supported HMM profile database of viral protein fam ilies
Journal Article · Wed Jun 16 00:00:00 EDT 2021 · Bioinformatics · OSTI ID:1810588

Kraken2 Metagenomic Virus Database
Dataset · Thu Apr 23 00:00:00 EDT 2020 · OSTI ID:1810588

IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
Journal Article · Sun Oct 30 00:00:00 EDT 2016 · Nucleic Acids Research · OSTI ID:1810588

Related Subjects