DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families

Abstract

Abstract Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at themore » genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. Availability and implementation The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. Supplementary information Supplementary data are available at Bioinformatics online.« less

Authors:
ORCiD logo; ; ; ; ; ;
Publication Date:
Research Org.:
USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC); Ministerio de Ciencia e Innovación (MCI); Agencia Estatal de investigación (AEI); European Regional Development Funds (ERDF)
OSTI Identifier:
1810588
Alternate Identifier(s):
OSTI ID: 1778483; OSTI ID: 1904109
Grant/Contract Number:  
AC02–05CH11231; no. DE-AC02–05CH11231; AC02-05CH11231
Resource Type:
Published Article
Journal Name:
Bioinformatics
Additional Journal Information:
Journal Name: Bioinformatics Journal Volume: 37 Journal Issue: 13; Journal ID: ISSN 1367-4803
Publisher:
Oxford University Press
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Pons, Joan Carles, Paez-Espino, David, Riera, Gabriel, Ivanova, Natalia, Kyrpides, Nikos C., Llabrés, Mercè, and Valencia, ed., Alfonso. VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. United Kingdom: N. p., 2021. Web. doi:10.1093/bioinformatics/btab026.
Pons, Joan Carles, Paez-Espino, David, Riera, Gabriel, Ivanova, Natalia, Kyrpides, Nikos C., Llabrés, Mercè, & Valencia, ed., Alfonso. VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. United Kingdom. https://doi.org/10.1093/bioinformatics/btab026
Pons, Joan Carles, Paez-Espino, David, Riera, Gabriel, Ivanova, Natalia, Kyrpides, Nikos C., Llabrés, Mercè, and Valencia, ed., Alfonso. Wed . "VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families". United Kingdom. https://doi.org/10.1093/bioinformatics/btab026.
@article{osti_1810588,
title = {VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families},
author = {Pons, Joan Carles and Paez-Espino, David and Riera, Gabriel and Ivanova, Natalia and Kyrpides, Nikos C. and Llabrés, Mercè and Valencia, ed., Alfonso},
abstractNote = {Abstract Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. Availability and implementation The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. Supplementary information Supplementary data are available at Bioinformatics online.},
doi = {10.1093/bioinformatics/btab026},
journal = {Bioinformatics},
number = 13,
volume = 37,
place = {United Kingdom},
year = {Wed Jan 20 00:00:00 EST 2021},
month = {Wed Jan 20 00:00:00 EST 2021}
}

Works referenced in this record:

Uncovering Earth’s virome
journal, August 2016

  • Paez-Espino, David; Eloe-Fadrosh, Emiley A.; Pavlopoulos, Georgios A.
  • Nature, Vol. 536, Issue 7617
  • DOI: 10.1038/nature19094

VICTOR: genome-based phylogeny and classification of prokaryotic viruses
journal, July 2017


Marine viruses — major players in the global ecosystem
journal, October 2007


Viral taxonomy derived from evolutionary genome relationships
journal, August 2019


Virus classification – where do you draw the line?
journal, July 2018


ViralZone: a knowledge resource to understand virus diversity
journal, October 2010

  • Hulo, Chantal; de Castro, Edouard; Masson, Patrick
  • Nucleic Acids Research, Vol. 39, Issue suppl_1
  • DOI: 10.1093/nar/gkq901

Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses
journal, September 2016

  • Roux, Simon; Brum, Jennifer R.; Dutilh, Bas E.
  • Nature, Vol. 537, Issue 7622
  • DOI: 10.1038/nature19366

IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes
journal, October 2018

  • Chen, I-Min A.; Chu, Ken; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky901

Diversity, evolution, and classification of virophages uncovered through global metagenomics
journal, December 2019


vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria
journal, January 2017


Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks
journal, May 2019


Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
journal, July 2015


HMMER web server: 2018 update
journal, June 2018

  • Potter, Simon C.; Luciani, Aurélien; Eddy, Sean R.
  • Nucleic Acids Research, Vol. 46, Issue W1
  • DOI: 10.1093/nar/gky448

IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes
journal, November 2018

  • Paez-Espino, David; Roux, Simon; Chen, I-Min A.
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky1127

Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus
journal, March 2019


Giant virus diversity and host interactions through global metagenomics
journal, January 2020


Expression of animal virus genomes.
journal, January 1971


Overview of Virus Metagenomic Classification Methods and Their Biological Applications
journal, April 2018


Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
journal, November 2016

  • Ahlgren, Nathan A.; Ren, Jie; Lu, Yang Young
  • Nucleic Acids Research, Vol. 45, Issue 1
  • DOI: 10.1093/nar/gkw1002

Evaluation of the genomic diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps towards a unified taxonomy
journal, September 2018

  • Aiewsakun, Pakorn; Adriaenssens, Evelien M.; Lavigne, Rob
  • Journal of General Virology, Vol. 99, Issue 9
  • DOI: 10.1099/jgv.0.001110

Global Organization and Proposed Megataxonomy of the Virus World
journal, March 2020

  • Koonin, Eugene V.; Dolja, Valerian V.; Krupovic, Mart
  • Microbiology and Molecular Biology Reviews, Vol. 84, Issue 2
  • DOI: 10.1128/MMBR.00061-19

Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data
journal, July 2017

  • Paez-Espino, David; Pavlopoulos, Georgios A.; Ivanova, Natalia N.
  • Nature Protocols, Vol. 12, Issue 8
  • DOI: 10.1038/nprot.2017.063

WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs
journal, July 2017


Linking Virus Genomes with Host Taxonomy
journal, March 2016

  • Mihara, Tomoko; Nishimura, Yosuke; Shimizu, Yugo
  • Viruses, Vol. 8, Issue 3
  • DOI: 10.3390/v8030066