VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families
Abstract
Abstract Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at themore »
- Authors:
- Publication Date:
- Research Org.:
- USDOE Joint Genome Institute (JGI), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); Ministerio de Ciencia e Innovación (MCI); Agencia Estatal de investigación (AEI); European Regional Development Funds (ERDF)
- OSTI Identifier:
- 1810588
- Alternate Identifier(s):
- OSTI ID: 1778483; OSTI ID: 1904109
- Grant/Contract Number:
- AC02–05CH11231; no. DE-AC02–05CH11231; AC02-05CH11231
- Resource Type:
- Published Article
- Journal Name:
- Bioinformatics
- Additional Journal Information:
- Journal Name: Bioinformatics Journal Volume: 37 Journal Issue: 13; Journal ID: ISSN 1367-4803
- Publisher:
- Oxford University Press
- Country of Publication:
- United Kingdom
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES
Citation Formats
Pons, Joan Carles, Paez-Espino, David, Riera, Gabriel, Ivanova, Natalia, Kyrpides, Nikos C., Llabrés, Mercè, and Valencia, ed., Alfonso. VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. United Kingdom: N. p., 2021.
Web. doi:10.1093/bioinformatics/btab026.
Pons, Joan Carles, Paez-Espino, David, Riera, Gabriel, Ivanova, Natalia, Kyrpides, Nikos C., Llabrés, Mercè, & Valencia, ed., Alfonso. VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. United Kingdom. https://doi.org/10.1093/bioinformatics/btab026
Pons, Joan Carles, Paez-Espino, David, Riera, Gabriel, Ivanova, Natalia, Kyrpides, Nikos C., Llabrés, Mercè, and Valencia, ed., Alfonso. Wed .
"VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families". United Kingdom. https://doi.org/10.1093/bioinformatics/btab026.
@article{osti_1810588,
title = {VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families},
author = {Pons, Joan Carles and Paez-Espino, David and Riera, Gabriel and Ivanova, Natalia and Kyrpides, Nikos C. and Llabrés, Mercè and Valencia, ed., Alfonso},
abstractNote = {Abstract Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. Availability and implementation The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. Supplementary information Supplementary data are available at Bioinformatics online.},
doi = {10.1093/bioinformatics/btab026},
journal = {Bioinformatics},
number = 13,
volume = 37,
place = {United Kingdom},
year = {Wed Jan 20 00:00:00 EST 2021},
month = {Wed Jan 20 00:00:00 EST 2021}
}
https://doi.org/10.1093/bioinformatics/btab026
Works referenced in this record:
Uncovering Earth’s virome
journal, August 2016
- Paez-Espino, David; Eloe-Fadrosh, Emiley A.; Pavlopoulos, Georgios A.
- Nature, Vol. 536, Issue 7617
VICTOR: genome-based phylogeny and classification of prokaryotic viruses
journal, July 2017
- Meier-Kolthoff, Jan P.; Göker, Markus
- Bioinformatics, Vol. 33, Issue 21
Marine viruses — major players in the global ecosystem
journal, October 2007
- Suttle, Curtis A.
- Nature Reviews Microbiology, Vol. 5, Issue 10
Viral taxonomy derived from evolutionary genome relationships
journal, August 2019
- Dougan, Tyler J.; Quake, Stephen R.
- PLOS ONE, Vol. 14, Issue 8
Virus classification – where do you draw the line?
journal, July 2018
- Simmonds, Peter; Aiewsakun, Pakorn
- Archives of Virology, Vol. 163, Issue 8
ViralZone: a knowledge resource to understand virus diversity
journal, October 2010
- Hulo, Chantal; de Castro, Edouard; Masson, Patrick
- Nucleic Acids Research, Vol. 39, Issue suppl_1
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses
journal, September 2016
- Roux, Simon; Brum, Jennifer R.; Dutilh, Bas E.
- Nature, Vol. 537, Issue 7622
IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes
journal, October 2018
- Chen, I-Min A.; Chu, Ken; Palaniappan, Krishna
- Nucleic Acids Research, Vol. 47, Issue D1
Diversity, evolution, and classification of virophages uncovered through global metagenomics
journal, December 2019
- Paez-Espino, David; Zhou, Jinglie; Roux, Simon
- Microbiome, Vol. 7, Issue 1
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria
journal, January 2017
- Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem
- PeerJ, Vol. 5
Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks
journal, May 2019
- Bin Jang, Ho; Bolduc, Benjamin; Zablocki, Olivier
- Nature Biotechnology, Vol. 37, Issue 6
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
journal, July 2015
- Roux, Simon; Hallam, Steven J.; Woyke, Tanja
- eLife, Vol. 4
HMMER web server: 2018 update
journal, June 2018
- Potter, Simon C.; Luciani, Aurélien; Eddy, Sean R.
- Nucleic Acids Research, Vol. 46, Issue W1
IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes
journal, November 2018
- Paez-Espino, David; Roux, Simon; Chen, I-Min A.
- Nucleic Acids Research, Vol. 47, Issue D1
Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus
journal, March 2019
- Gałan, Wojciech; Bąk, Maciej; Jakubowska, Małgorzata
- Scientific Reports, Vol. 9, Issue 1
Giant virus diversity and host interactions through global metagenomics
journal, January 2020
- Schulz, Frederik; Roux, Simon; Paez-Espino, David
- Nature, Vol. 578, Issue 7795
Expression of animal virus genomes.
journal, January 1971
- Baltimore, D.
- Bacteriological Reviews, Vol. 35, Issue 3
Overview of Virus Metagenomic Classification Methods and Their Biological Applications
journal, April 2018
- Nooij, Sam; Schmitz, Dennis; Vennema, Harry
- Frontiers in Microbiology, Vol. 9
Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
journal, November 2016
- Ahlgren, Nathan A.; Ren, Jie; Lu, Yang Young
- Nucleic Acids Research, Vol. 45, Issue 1
Evaluation of the genomic diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps towards a unified taxonomy
journal, September 2018
- Aiewsakun, Pakorn; Adriaenssens, Evelien M.; Lavigne, Rob
- Journal of General Virology, Vol. 99, Issue 9
Global Organization and Proposed Megataxonomy of the Virus World
journal, March 2020
- Koonin, Eugene V.; Dolja, Valerian V.; Krupovic, Mart
- Microbiology and Molecular Biology Reviews, Vol. 84, Issue 2
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data
journal, July 2017
- Paez-Espino, David; Pavlopoulos, Georgios A.; Ivanova, Natalia N.
- Nature Protocols, Vol. 12, Issue 8
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs
journal, July 2017
- Galiez, Clovis; Siebert, Matthias; Enault, François
- Bioinformatics, Vol. 33, Issue 19
Linking Virus Genomes with Host Taxonomy
journal, March 2016
- Mihara, Tomoko; Nishimura, Yosuke; Shimizu, Yugo
- Viruses, Vol. 8, Issue 3