Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Thousands of small, novel genes predicted in global phage genomes

Journal Article · · Cell Reports
Small genes (<150nucleotides) have been systematically overlooked in phage genomes. We employ a large scale comparative genomics approach to predict >40,000 small-gene families in 2.3 million phage genome contigs. We find that small genes in phage genomes are approximately 3-fold more prevalent than in host prokaryotic genomes. Our approach enriches for small genes that are translated in microbiomes, suggesting the small genes identified are coding. More than 9,000 families encode potentially secreted or transmembrane proteins, more than 5,000families encode predicted anti-CRISPR proteins, and more than500families encode predicted antimicrobial proteins. By combining homology and genomic-neighborhood analyses, we reveal substantial novelty and diversity within phage biology, including small phage genes found in multiple host phyla, small genes encoding proteins that play essential roles in host infection, and small genes that share genomic neighborhoods and whose encoded proteins may share related functions.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
2565839
Report Number(s):
PNNL-SA-171375
Journal Information:
Cell Reports, Journal Name: Cell Reports Journal Issue: 12 Vol. 39
Country of Publication:
United States
Language:
English

References (73)

Dominance interactions in Escherichia coli cells mixedly infected with bacteriophage T4D wild-type and amber mutants and their possible implications as to type of gene-product function: Catalytic vs. stoichiometric journal August 1968
Phages and their potential to modulate the microbiome and immunity journal September 2020
Small proteins: untapped area of potential biological importance journal January 2013
PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins journal May 2020
Soil Microbes Trade-Off Biogeochemical Cycling for Stress Tolerance Traits in Response to Year-Round Climate Change journal May 2020
Structured RNA Contaminants in Bacterial Ribo-Seq journal October 2020
The EMBL-EBI search and sequence analysis tools APIs in 2019 journal April 2019
Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling journal April 2009
Energy efficiency and biological interactions define the core microbiome of deep oligotrophic groundwater journal July 2021
Stop codon reassignments in the wild journal May 2014
DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns journal February 2019
Uncovering small membrane proteins in pathogenic bacteria: Regulatory functions and therapeutic potential journal July 2020
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 journal March 2010
CDD: a Conserved Domain Database for the functional annotation of proteins journal November 2010
SignalP 5.0 improves signal peptide predictions using deep neural networks journal February 2019
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins journal January 2007
REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes journal August 2017
Strains, functions and dynamics in the expanded Human Microbiome Project journal September 2017
Bacterial Secretion Systems: An Overview journal February 2016
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
Draft Genome Sequence of Mn(II)-Oxidizing Bacterium Oxalobacteraceae sp. Strain AB_14 journal October 2019
A complete domain-to-species taxonomy for Bacteria and Archaea journal April 2020
Host-linked soil viral ecology along a permafrost thaw gradient journal July 2018
AcrDB: a database of anti-CRISPR operons in prokaryotes and viruses journal October 2020
Small Proteins Can No Longer Be Ignored journal June 2014
Molecular assembly and structure of the bacteriophage T4 tail journal November 2016
VirSorter: mining viral signal from microbial genomic data journal January 2015
TreeDyn: towards dynamic graphics and annotations for analyses of trees journal October 2006
Metagenomic Analysis of Subtidal Sediments from Polar and Subpolar Coastal Environments Highlights the Relevance of Anaerobic Hydrocarbon Degradation Processes journal July 2017
A call for standardized classification of metagenome projects: Genomics update journal March 2010
Viral Miniproteins journal September 2014
Quorum sensing integrates environmental cues, cell density and cell history to control bacterial competence journal October 2017
Arginine-Rich Small Proteins with a Domain of Unknown Function, DUF1127, Play a Role in Phosphate and Carbon Metabolism of Agrobacterium tumefaciens journal October 2020
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences journal May 2006
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen journal January 2001
Phylogeny.fr: robust phylogenetic analysis for the non-specialist journal May 2008
Small bacterial and phagic proteins: an updated view on a rapidly moving field journal October 2017
Gene and translation initiation site prediction in metagenomic sequences journal July 2012
AcrHub: an integrative hub for investigating, predicting and mapping anti-CRISPR proteins journal November 2020
The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut journal November 2020
The small protein floodgates are opening; now the functional analysis begins journal December 2014
Cutadapt removes adapter sequences from high-throughput sequencing reads journal May 2011
Diversity, evolution, and classification of virophages uncovered through global metagenomics journal December 2019
VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families journal January 2021
Giant virus diversity and host interactions through global metagenomics journal January 2020
Author Correction: A genomic catalog of Earth’s microbiomes journal April 2021
CDD: a Conserved Domain Database for protein classification journal December 2004
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome journal January 2009
Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome journal June 2021
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes journal January 2021
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data journal July 2017
Genomes OnLine database (GOLD) v.7: updates and new features journal October 2018
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses journal November 2020
BEDTools: a flexible suite of utilities for comparing genomic features journal January 2010
Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis journal April 2000
MetaRibo-Seq measures translation in microbiomes journal June 2020
Marine DNA Viral Macro- and Microdiversity from Pole to Pole journal May 2019
Structure of the T4 baseplate and its function in triggering sheath contraction journal May 2016
Simultaneous ribosome profiling of hundreds of microbes from the human microbiome journal August 2021
A Primary Physiological Role of Toxin/Antitoxin Systems Is Phage Inhibition journal August 2020
Communication between viruses guides lysis–lysogeny decisions journal January 2017
Uncovering Earth’s virome journal August 2016
CD-HIT: accelerated for clustering the next-generation sequencing data journal October 2012
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data journal February 2011
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood journal October 2003
Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes journal July 2019
AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest journal January 2018
RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression journal January 2001
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Approximate Likelihood-Ratio Test for Branches: A Fast, Accurate, and Powerful Alternative journal August 2006
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997

Similar Records

Thousands of small, novel genes predicted in global phage genomes
Journal Article · Mon Jun 20 20:00:00 EDT 2022 · Cell Reports · OSTI ID:1876276

Related Subjects