Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Thousands of small, novel genes predicted in global phage genomes

Journal Article · · Cell Reports
 [1]; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
  1. Joint Genome Institute, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); GP-SmORF Consortium. et al
We report small genes (<150 nucleotides) have been systematically overlooked in phage genomes. We employ a large-scale comparative genomics approach to predict >40,000 small-gene families in ~2.3 million phage genome contigs. We find that small genes in phage genomes are approximately 3-fold more prevalent than in host prokaryotic genomes. Our approach enriches for small genes that are translated in microbiomes, suggesting the small genes identified are coding. More than 9,000 families encode potentially secreted or transmembrane proteins, more than 5,000 families encode predicted anti-CRISPR proteins, and more than 500 families encode predicted antimicrobial proteins. By combining homology and genomic-neighborhood analyses, we reveal substantial novelty and diversity within phage biology, including small phage genes found in multiple host phyla, small genes encoding proteins that play essential roles in host infection, and small genes that share genomic neighborhoods and whose encoded proteins may share related functions.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
National Institutes of Health (NIH); National Science Foundation (NSF); Simons Foundation; USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Contributing Organization:
Global Phage Small Open Reading Frame (GP-SmORF) Consortium
Grant/Contract Number:
AC02-05CH11231; AC05-00OR22725; AC52-07NA27344
OSTI ID:
1876276
Alternate ID(s):
OSTI ID: 1889692
OSTI ID: 2565839
OSTI ID: 2583347
Report Number(s):
LLNL--JRNL-854337
Journal Information:
Cell Reports, Journal Name: Cell Reports Journal Issue: 12 Vol. 39; ISSN 2211-1247
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (74)

Differential effects of press vs. pulse seawater intrusion on microbial communities of a tidal freshwater marsh journal November 2020
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen journal January 2001
Metagenomic Analysis of Subtidal Sediments from Polar and Subpolar Coastal Environments Highlights the Relevance of Anaerobic Hydrocarbon Degradation Processes journal July 2017
Molecular assembly and structure of the bacteriophage T4 tail journal November 2016
Dominance interactions in Escherichia coli cells mixedly infected with bacteriophage T4D wild-type and amber mutants and their possible implications as to type of gene-product function: Catalytic vs. stoichiometric journal August 1968
Marine DNA Viral Macro- and Microdiversity from Pole to Pole journal May 2019
The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut journal November 2020
Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes journal January 2021
Small bacterial and phagic proteins: an updated view on a rapidly moving field journal October 2017
Structure of the T4 baseplate and its function in triggering sheath contraction journal May 2016
Uncovering Earth’s virome journal August 2016
Communication between viruses guides lysis–lysogeny decisions journal January 2017
Strains, functions and dynamics in the expanded Human Microbiome Project journal September 2017
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
Phages and their potential to modulate the microbiome and immunity journal September 2020
Quorum sensing integrates environmental cues, cell density and cell history to control bacterial competence journal October 2017
MetaRibo-Seq measures translation in microbiomes journal June 2020
Energy efficiency and biological interactions define the core microbiome of deep oligotrophic groundwater journal July 2021
Host-linked soil viral ecology along a permafrost thaw gradient journal July 2018
Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes journal July 2019
Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome journal June 2021
Giant virus diversity and host interactions through global metagenomics journal January 2020
SignalP 5.0 improves signal peptide predictions using deep neural networks journal February 2019
A complete domain-to-species taxonomy for Bacteria and Archaea journal April 2020
Author Correction: A genomic catalog of Earth’s microbiomes journal April 2021
Simultaneous ribosome profiling of hundreds of microbes from the human microbiome journal August 2021
AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest journal January 2018
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood journal October 2003
Approximate Likelihood-Ratio Test for Branches: A Fast, Accurate, and Powerful Alternative journal August 2006
VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families journal January 2021
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences journal May 2006
BEDTools: a flexible suite of utilities for comparing genomic features journal January 2010
Gene and translation initiation site prediction in metagenomic sequences journal July 2012
CD-HIT: accelerated for clustering the next-generation sequencing data journal October 2012
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression journal January 2001
PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins journal May 2020
AcrDB: a database of anti-CRISPR operons in prokaryotes and viruses journal October 2020
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses journal November 2020
AcrHub: an integrative hub for investigating, predicting and mapping anti-CRISPR proteins journal November 2020
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
CDD: a Conserved Domain Database for protein classification journal December 2004
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins journal January 2007
Phylogeny.fr: robust phylogenetic analysis for the non-specialist journal May 2008
CDD: a Conserved Domain Database for the functional annotation of proteins journal November 2010
REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes journal August 2017
Genomes OnLine database (GOLD) v.7: updates and new features journal October 2018
DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns journal February 2019
The EMBL-EBI search and sequence analysis tools APIs in 2019 journal April 2019
Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis journal April 2000
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 journal March 2010
A call for standardized classification of metagenome projects: Genomics update journal March 2010
Uncovering small membrane proteins in pathogenic bacteria: Regulatory functions and therapeutic potential journal July 2020
Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling journal April 2009
Stop codon reassignments in the wild journal May 2014
Arginine-Rich Small Proteins with a Domain of Unknown Function, DUF1127, Play a Role in Phosphate and Carbon Metabolism of Agrobacterium tumefaciens journal October 2020
Draft Genome Sequence of Mn(II)-Oxidizing Bacterium Oxalobacteraceae sp. Strain AB_14 journal October 2019
Structured RNA Contaminants in Bacterial Ribo-Seq journal October 2020
Bacterial Secretion Systems: An Overview journal February 2016
Small Proteins Can No Longer Be Ignored journal June 2014
Viral Miniproteins journal September 2014
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
TreeDyn: towards dynamic graphics and annotations for analyses of trees journal October 2006
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome journal January 2009
The small protein floodgates are opening; now the functional analysis begins journal December 2014
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data journal July 2017
Diversity, evolution, and classification of virophages uncovered through global metagenomics journal December 2019
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data journal February 2011
Cutadapt removes adapter sequences from high-throughput sequencing reads journal May 2011
Small proteins: untapped area of potential biological importance journal January 2013
Soil Microbes Trade-Off Biogeochemical Cycling for Stress Tolerance Traits in Response to Year-Round Climate Change journal May 2020
A Primary Physiological Role of Toxin/Antitoxin Systems Is Phage Inhibition journal August 2020
VirSorter: mining viral signal from microbial genomic data journal January 2015

Similar Records

Thousands of small, novel genes predicted in global phage genomes
Journal Article · Tue Jun 21 00:00:00 EDT 2022 · Cell Reports · OSTI ID:2565839