DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria

Journal Article · · PLoS Biology (Online)
ORCiD logo [1];  [1];  [2];  [3];  [4];  [1];  [5]
  1. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). USDOE Joint Genome Institute
  2. Instituto de Ciencias del Mar (ICM-CSIC), Barcelona (Spain)
  3. University of Iowa College of Dentistry, Iowa City, IA (United States)
  4. Friedrich Schiller University, Jena (Germany); Utrecht University (Netherlands)
  5. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); European Research Council (ERC)
Grant/Contract Number:
AC02-05CH11231; 865694
OSTI ID:
1972401
Alternate ID(s):
OSTI ID: 1971185; OSTI ID: 2228615
Journal Information:
PLoS Biology (Online), Vol. 21, Issue 4; ISSN 1545-7885
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English

References (53)

PILER-CR: Fast and accurate identification of CRISPR repeats journal January 2007
Host-linked soil viral ecology along a permafrost thaw gradient journal July 2018
Metagenomic Evaluation of the Highly Abundant Human Gut Bacteriophage CrAssphage for Source Tracking of Human Fecal Pollution journal September 2014
Genomes OnLine database (GOLD) v.7: updates and new features journal October 2018
INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes journal December 2021
Optuna: A Next-generation Hyperparameter Optimization Framework
  • Akiba, Takuya; Sano, Shotaro; Yanase, Toshihiko
  • KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3292500.3330701
conference July 2019
Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome journal June 2021
Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems journal February 2016
A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes journal July 2014
RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content journal July 2021
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses journal November 2020
Metagenomic tools in microbial ecology research journal February 2021
DeepHost: phage host prediction with convolutional neural network journal September 2021
Probing Individual Environmental Bacteria for Viruses by Using Microfluidic Digital PCR journal June 2011
Computational approaches to predict bacteriophage–host relationships journal December 2015
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins journal January 2021
Minimum Information about an Uncultivated Virus Genome (MIUViG) journal December 2018
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Fast and sensitive protein alignment using DIAMOND journal November 2014
SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts journal April 2021
HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes journal August 2021
Mash: fast genome and metagenome distance estimation using MinHash journal June 2016
The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities journal October 2020
Phage or foe: an insight into the impact of viral predation on microbial communities journal January 2018
Mapping CRISPR spaceromes reveals vast host-specific viromes of prokaryotes journal June 2020
GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy journal September 2021
KBase: The United States Department of Energy Systems Biology Knowledgebase journal July 2018
CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats journal June 2007
HostPhinder: A Phage Host Prediction Tool journal May 2016
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation journal November 2015
Evaluation of the novel crAssphage marker for sewage pollution tracking in storm drain outfalls in Tampa, Florida journal March 2018
BLAST+: architecture and applications journal January 2009
WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs journal July 2017
Taxonomy-aware, sequence similarity ranking reliably predicts phage–host relationships journal October 2021
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries journal November 2018
Molecular Bases and Role of Viruses in the Human Microbiome journal November 2014
PHIST: fast and accurate prediction of prokaryotic hosts from metagenomic viral sequences journal December 2021
Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut journal November 2017
Integrating Viral Metagenomics into an Ecological Framework journal September 2021
The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut journal November 2020
Interaction dynamics and virus–host range for estuarine actinophages captured by epicPCR journal February 2021
Global overview and major challenges of host prediction methods for uncultivated phages journal August 2021
Plankton networks driving carbon export in the oligotrophic ocean journal February 2016
Smoothing Parameter and Model Selection for General Smooth Models journal October 2016
BACPHLIP: predicting bacteriophage lifestyle from conserved protein domains journal January 2021
Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus journal March 2019
Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences journal November 2016
ΦCrAss001 represents the most abundant bacteriophage family in the human gut and infects Bacteroides intestinalis journal November 2018
Crass: identification and reconstruction of CRISPR from unassembled metagenomic data journal March 2013
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database journal November 2019
Mash: fast genome and metagenome distance estimation using MinHash collection January 2016
Minnesota peat viromes reveal terrestrial and aquatic niche partitioning for local and global viral populations collection January 2021