Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

PHANOTATE: a novel approach to gene identification in phage genomes

Journal Article · · Bioinformatics
 [1];  [2];  [3];  [2];  [4]
  1. San Diego State Univ., CA (United States). Computational Sciences Research Center; DOE/OSTI
  2. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  3. San Diego State Univ., CA (United States). Dept. of Biology
  4. San Diego State Univ., CA (United States). Computational Sciences Research Center. Dept. of Biology. Viral Information Inst.

Motivation: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path. Results: We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes. Availability and implementation: https://github.com/deprekate/PHANOTATE. Supplementary information: Supplementary data are available at Bioinformatics online.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC52-07NA27344
OSTI ID:
1625296
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Journal Issue: 22 Vol. 35; ISSN 1367-4803
Publisher:
International Society for Computational Biology - Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (34)

Metagenomics and future perspectives in virus discovery journal February 2012
Rz/Rz1 Lysis Gene Equivalents in Phages of Gram-negative Hosts journal November 2007
Protein family classification and functional annotation journal February 2003
Prokka: rapid prokaryotic genome annotation journal March 2014
Frameshift alignment: statistics and post-genomic applications journal August 2014
GenBank journal November 2016
PHASTER: a better, faster version of the PHAST phage search tool journal May 2016
Adaptive seeds tame genomic sequence comparison journal January 2011
Top-Down Proteomic Identification of Shiga Toxin 2 Subtypes from Shiga Toxin-Producing Escherichia coli by Matrix-Assisted Laser Desorption Ionization-Tandem Time of Flight Mass Spectrometry journal February 2014
The Phage Proteomic Tree: a Genome-Based Taxonomy for Phage journal August 2002
Multivariate Entropy Distance Method for Prokaryotic gene Identification journal June 2004
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Genetic Analysis of the Lambda Spanins Rz and Rz1: Identification of Functional Domains journal December 2016
Protein family classification and functional annotation journal February 2003
On a Routing Problem journal July 2004
On a routing problem journal January 1958
PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive journal March 2017
Heuristic approach to deriving models for gene finding journal October 1999
Database resources of the National Center for Biotechnology Information journal October 2020
Database resources of the National Center for Biotechnology Information journal January 2006
Database resources of the National Center for Biotechnology Information journal December 2007
Database resources of the National Center for Biotechnology Information journal November 2012
PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies journal May 2012
Database resources of the National Center for Biotechnology Information journal November 2015
Database resources of the National Center for Biotechnology Information journal November 2018
CRITICA: coding region identification tool invoking comparative analysis journal April 1999
Effect size, confidence interval and statistical significance: a practical guide for biologists journal November 2007
Top-Down Proteomic Identification of Shiga Toxin 2 Subtypes from Shiga Toxin-Producing Escherichia coli by Matrix-Assisted Laser Desorption Ionization-Tandem Time of Flight Mass Spectrometry journal February 2014
The Phage Proteomic Tree: a Genome-Based Taxonomy for Phage journal August 2002
Genomics and Proteomics of Mycobacteriophage Patience, an Accidental Tourist in the Mycobacterium Neighborhood journal October 2014
Strain-Resolved Dynamics of the Lung Microbiome in Patients with Cystic Fibrosis journal April 2021
Draft Genome Sequence of Comamonas aquatilis Strain LK (= CSUR P6418 = CECT 9772), Isolated from the Planarian Schmidtea mediterranea journal February 2021
Multivariate Entropy Distance Method for Prokaryotic gene Identification journal June 2004
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes journal July 2015

Cited By (9)

8-OH-DPAT, a 5-HT 1A agonist and ritanserin, a 5-HT 2A/C antagonist, reverse haloperidol-induced catalepsy in rats independently of striatal dopamine release journal May 1997
multiPhATE: bioinformatics pipeline for functional annotation of phage isolates journal May 2019
Complete Genome Sequence of XaF13, a Novel Bacteriophage of Xanthomonas vesicatoria from Mexico
  • Solís-Sánchez, Guillermo Alejandro; Quiñones-Aguilar, Evangelina Esmeralda; Fraire-Velázquez, Saul
  • Microbiology Resource Announcements, Vol. 9, Issue 5 https://doi.org/10.1128/mra.01371-19
journal January 2020
Genomic and ecological attributes of marine bacteriophages encoding bacterial virulence genes journal February 2020
A Method for Improving the Accuracy and Efficiency of Bacteriophage Genome Annotation journal July 2019
Isolation and Characterization of Two Klebsiella pneumoniae Phages Encoding Divergent Depolymerases journal April 2020
Genetic Mining of Newly Isolated Salmophages for Phage Therapy journal August 2022
Global phylogeography and ancient evolution of the widespread human gut virus crAssphage journal July 2019
Isolation of Four Lytic Phages Infecting Klebsiella pneumoniae K22 Clinical Isolates from Spain journal January 2020