DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents

Journal Article · · Genome Biology and Evolution
DOI: https://doi.org/10.1093/gbe/evz206 · OSTI ID:1625346
 [1];  [2];  [3];  [4];  [5]; ORCiD logo [6];
  1. Dartmouth College, Hanover, NH (United States). Dept. of Biological Sciences; DOE/OSTI
  2. Dartmouth College, Hanover, NH (United States). Dept. of Biological Sciences; Amazon.com Inc., Seattle, WA (United States)
  3. Dartmouth College, Hanover, NH (United States). Dept. of Biological Sciences; Harvard University, Cambridge, MA (United States). School of Engineering and Applied Sciences
  4. Dartmouth College, Hanover, NH (United States). Dept. of Biological Sciences; Harvard Univ., Cambridge, MA (United States). Dept. of Earth and Planetary Sciences
  5. Dartmouth College, Hanover, NH (United States). Dept. of Biological Sciences; Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Bioscience Div.
  6. Dartmouth College, Hanover, NH (United States). Dept. of Biological Sciences; Dartmouth College, Hanover, NH (United States). Dept. of Computer Science

Abstract Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the “head–tail” gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a “support vector machine” classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like “head–tail” gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.

Research Organization:
Los Alamos National Lab (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
Dartmouth Dean of Faculty; Dartmouth James O. Freedman Presidential Scholarship; National Science Foundation (NSF); Simons Foundation Investigator in Mathematical Modeling of Living Systems; USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1625346
Journal Information:
Genome Biology and Evolution, Journal Name: Genome Biology and Evolution Journal Issue: 10 Vol. 11; ISSN 1759-6653
Publisher:
Society for Molecular Biology and EvolutionCopyright Statement
Country of Publication:
United States
Language:
English

References (71)

Prediction of protein cellular attributes using pseudo-amino acid composition journal January 2001
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods journal September 1994
The depths of virus exaptation journal August 2018
Systematics: The Cohesive Nature of Bacterial Species Taxa journal March 2019
Importance of widespread gene transfer agent genes in α-proteobacteria journal February 2007
A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life journal August 2018
Widespread distribution of encapsulin nanocompartments reveals functional diversity journal March 2017
Occurrence of the potent mutagens 2- nitrobenzanthrone and 3-nitrobenzanthrone in fine airborne particles journal January 2019
Genetic Recombination in Rhodopseudomonas capsulata journal March 1974
Classifying G-protein coupled receptors with support vector machines journal January 2002
Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2 journal February 2012
Small extracellular particles with big potential for horizontal gene transfer: membrane vesicles and gene transfer agents journal August 2018
Evolutionary Genomics of an Ancient Prophage of the Order Sphingomonadales journal March 2017
Packaging of Dinoroseobacter shibae DNA into Gene Transfer Agent Particles Is Not Random journal January 2018
Functional and Evolutionary Characterization of a Gene Transfer Agent’s Multilocus “Genome” journal June 2016
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Kernel-based machine learning protocol for predicting DNA-binding proteins journal November 2005
Prophage Hunter: an integrative hunting tool for active prophages journal May 2019
The Rhodobacter capsulatus gene transfer agent is induced by nutrient depletion and the RNAP omega subunit journal September 2017
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree preprint January 2010
A century of phage research: Bacteriophages and the shaping of modern biology: Cause to reflect journal December 2014
Prediction of protein cellular attributes using pseudo-amino acid composition journal January 2001
Prediction of protein cellular attributes using pseudo-amino acid composition journal January 2001
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods journal September 1994
Support-vector networks journal September 1995
Comparison of the predicted and observed secondary structure of T4 phage lysozyme journal October 1975
The depths of virus exaptation journal August 2018
Importance of widespread gene transfer agent genes in α-proteobacteria journal February 2007
Genetic and life-history traits associated with the distribution of prophages in bacteria journal March 2016
A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life journal August 2018
Widespread distribution of encapsulin nanocompartments reveals functional diversity journal March 2017
Gene transfer agents: phage-like elements of genetic exchange journal June 2012
Evolutionary stasis of a deep subsurface microbial lineage journal April 2021
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries journal November 2018
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life journal September 2017
Mutant phenotypes for thousands of bacterial genes of unknown function journal May 2018
HRGPred: Prediction of herbicide resistant genes with k-mer nucleotide compositional features and support vector machine journal January 2019
A novel roseobacter phage possesses features of podoviruses, siphoviruses, prophages and gene transfer agents journal July 2016
Type VI secretion apparatus and phage tail-associated protein complexes share a common evolutionary origin journal February 2009
Genetic Recombination in Rhodopseudomonas capsulata journal March 1974
Classifying G-protein coupled receptors with support vector machines journal January 2002
genoPlotR: comparative gene and genome visualization in R journal July 2010
Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2 journal February 2012
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies journal January 2014
Small extracellular particles with big potential for horizontal gene transfer: membrane vesicles and gene transfer agents journal August 2018
Evolutionary Genomics of an Ancient Prophage of the Order Sphingomonadales journal March 2017
Packaging of Dinoroseobacter shibae DNA into Gene Transfer Agent Particles Is Not Random journal January 2018
Ultrafast Approximation for Phylogenetic Bootstrap journal February 2013
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies journal November 2014
Functional and Evolutionary Characterization of a Gene Transfer Agent’s Multilocus “Genome” journal June 2016
UFBoot2: Improving the Ultrafast Bootstrap Approximation journal October 2017
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Kernel-based machine learning protocol for predicting DNA-binding proteins journal November 2005
PHASTER: a better, faster version of the PHAST phage search tool journal May 2016
Prophage Hunter: an integrative hunting tool for active prophages journal May 2019
Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices journal April 2016
Insights into origin and evolution of α-proteobacterial gene transfer agents journal July 2017
The Rhodobacter capsulatus gene transfer agent is induced by nutrient depletion and the RNAP omega subunit journal September 2017
DNA packaging bias and differential expression of gene transfer agent genes within a population during production and release of the Rhodobacter capsulatus gene transfer agent, RcGTA: Gene transfer agent gene expression and DNA packaging journal June 2012
The Distribution, Evolution, and Roles of Gene Transfer Agents in Prokaryotic Genetic Exchange journal September 2017
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree journal October 2010
Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning journal October 2013
High diversity of Rhodobacterales in the subarctic North Atlantic Ocean and gene transfer agent protein expression in isolated strains journal April 2010
PhageWeb – Web Interface for Rapid Identification and Characterization of Prophages in Bacterial Genomes journal December 2018
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree preprint January 2010
Activity landscape image analysis using convolutional neural networks text January 2020
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies text January 2014
Supplementary Data for Kogay et al. (2019) dataset January 2019
An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have independent origins journal February 2019
Predicting the host of influenza viruses based on the word vector journal January 2017