skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: efam: an e xpanded, metaproteome-supported HMM profile database of viral protein fam ilies

Journal Article · · Bioinformatics

Abstract Motivation Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. Results Here, we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240 311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from ‘conservative’ to ‘eXtremely Conservative’ resulted in 37 841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by ∼24% on average (up to ∼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem. Availability and implementation The resources are available on the iVirus platform at (doi.org/10.25739/9vze-4143). Supplementary information Supplementary data are available at Bioinformatics online.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); Gordon and Betty Moore Foundation; National Science Foundation (NSF)
Grant/Contract Number:
#DE-SC0020173; AC02-05CH11231; AC05-76RL0 1830; SC0020173; AC05-76RL01830; 3790; OCE-1536989; OCE-1829831; ABI-1758974
OSTI ID:
1888874
Alternate ID(s):
OSTI ID: 1808207; OSTI ID: 1904306
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Vol. 37 Journal Issue: 22; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (58)

Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences journal December 2013
A major lineage of non-tailed dsDNA viruses as unrecognized killers of marine bacteria journal January 2018
DRAM for distilling microbial metabolism to automate the curation of microbiome function journal August 2020
Defining the human gut host–phage network through single-cell viral tagging journal August 2019
Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome journal July 2020
Protein homology detection by HMM-HMM comparison journal November 2004
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer journal November 2016
The global virome: not as big as we thought? journal October 2013
TIGRFAMs and Genome Properties in 2013 journal November 2012
Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource journal October 2012
Marine DNA Viral Macro- and Microdiversity from Pole to Pole journal May 2019
The Pacific Ocean Virome (POV): A Marine Viral Metagenomic Dataset and Associated Protein Clusters for Quantitative Viral Ecology journal February 2013
Marine viruses — major players in the global ecosystem journal October 2007
Viral tagging reveals discrete populations in Synechococcus viral genome sequence space journal July 2014
Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses journal September 2016
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data journal February 2012
VirSorter: mining viral signal from microbial genomic data journal January 2015
A cross-platform toolkit for mass spectrometry and proteomics journal October 2012
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation journal August 2019
Minimum Information about an Uncultivated Virus Genome (MIUViG) journal December 2018
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches journal November 2014
Fast and sensitive protein alignment using DIAMOND journal November 2014
Search and clustering orders of magnitude faster than BLAST journal August 2010
Genome-centric view of carbon processing in thawing permafrost journal July 2018
Manipulation of cellular syntheses and the nature of viruses: The virocell concept journal April 2011
Mining, analyzing, and integrating viral signals from metagenomic data journal March 2019
KEGG as a reference resource for gene and protein annotation journal October 2015
VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses journal February 2021
Phage-specific metabolic reprogramming of virocells journal January 2020
The Third Age of Phage journal May 2005
Uncovering Earth’s virome journal August 2016
SNaPP: Simplified Nanoproteomics Platform for Reproducible Global Proteomic Analysis of Nanogram Protein Quantities journal March 2016
The Pfam protein families database in 2019 journal October 2018
Orthologous Gene Clusters and Taxon Signature Genes for Viruses of Prokaryotes journal December 2012
Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation journal October 2016
Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data journal August 2014
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins journal August 2018
Widespread endogenization of giant viruses shapes genomes of green algae journal November 2020
Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation journal January 2021
iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure journal July 2016
CheckV assesses the quality and completeness of metagenome-assembled viral genomes journal December 2020
Detecting overlapping protein complexes in protein-protein interaction networks journal March 2012
Illuminating structural proteins in viral “dark matter” with metaproteomics journal February 2016
The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut journal November 2020
An efficient algorithm for large-scale detection of protein families journal April 2002
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes journal July 2015
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Insights into the phylogeny and coding potential of microbial dark matter journal July 2013
Plankton networks driving carbon export in the oligotrophic ocean journal February 2016
The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific journal October 2019
Giant virus diversity and host interactions through global metagenomics journal January 2020
Profile hidden Markov models journal October 1998
Optimizing de novo genome assembly from PCR-amplified metagenomes journal January 2019
Evolutionary relationships among diverse bacteriophages and prophages: All the world's a phage journal March 1999
MS-GF+ makes progress towards a universal database search tool for proteomics journal October 2014
High-throughput mapping of the phage resistance landscape in E. coli journal October 2020