DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: OrthoPhyl—streamlining large-scale, orthology-based phylogenomic studies of bacteria at broad evolutionary scales

Journal Article · · G3

Abstract There are a staggering number of publicly available bacterial genome sequences (at writing, 2.0 million assemblies in NCBI's GenBank alone), and the deposition rate continues to increase. This wealth of data begs for phylogenetic analyses to place these sequences within an evolutionary context. A phylogenetic placement not only aids in taxonomic classification but informs the evolution of novel phenotypes, targets of selection, and horizontal gene transfer. Building trees from multi-gene codon alignments is a laborious task that requires bioinformatic expertise, rigorous curation of orthologs, and heavy computation. Compounding the problem is the lack of tools that can streamline these processes for building trees from large-scale genomic data. Here we present OrthoPhyl, which takes bacterial genome assemblies and reconstructs trees from whole genome codon alignments. The analysis pipeline can analyze an arbitrarily large number of input genomes (>1200 tested here) by identifying a diversity-spanning subset of assemblies and using these genomes to build gene models to infer orthologs in the full dataset. To illustrate the versatility of OrthoPhyl, we show three use cases: E. coli/Shigella, Brucella/Ochrobactrum and the order Rickettsiales. We compare trees generated with OrthoPhyl to trees generated with kSNP3 and GToTree along with published trees using alternative methods. We show that OrthoPhyl trees are consistent with other methods while incorporating more data, allowing for greater numbers of input genomes, and more flexibility of analysis.

Sponsoring Organization:
USDOE
Grant/Contract Number:
89233218CNA000001
OSTI ID:
2427443
Alternate ID(s):
OSTI ID: 2391070; OSTI ID: 2469611
Journal Information:
G3, Journal Name: G3 Journal Issue: 8 Vol. 14; ISSN 2160-1836
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (62)

NCBI Taxonomy: a comprehensive update on curation, resources and tools journal January 2020
The battle for user-friendly bioinformatics journal January 2013
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability journal January 2013
Are Combined Analyses Better Than Single Gene Phylogenies? A Case Study Using SSU rDNA and rbcL Sequence Comparisons in the Zygnematophyceae (Streptophyta) journal December 2003
Reclassification of Ochrobactrum lupini as a later heterotypic synonym of Ochrobactrum anthropi based on whole-genome sequence analysis
  • Gazolla Volpiano, Camila; Hayashi Sant'Anna, Fernando; Ambrosini, Adriana
  • International Journal of Systematic and Evolutionary Microbiology, Vol. 69, Issue 8 https://doi.org/10.1099/ijsem.0.003465
journal August 2019
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries journal November 2018
Identification of Recombination and Positively Selected Genes in Brucella journal July 2015
Conditioned Genome Reconstruction: How to Avoid Choosing the Conditioning Genome journal February 2007
Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy journal October 2021
The evolutionary origin of host association in the Rickettsiales journal July 2022
An evaluation of transcriptome‐based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura) journal June 2016
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies journal January 2014
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies journal November 2014
RevTrans: multiple alignment of coding DNA from aligned amino acid sequences journal July 2003
A Practical Guide to Design and Assess a Phylogenomic Study journal August 2022
Missing Data in Phylogenetic Analysis: Reconciling Results from Simulations and Empirical Data journal March 2011
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses journal June 2009
Automated Reconstruction of Whole-Genome Phylogenies from Short-Sequence Reads journal March 2014
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
ModelFinder: fast model selection for accurate phylogenetic estimates journal May 2017
Mugsy: fast multiple alignment of closely related whole genomes journal December 2010
When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes journal December 2013
Using Core Genome Alignments To Assign Bacterial Species journal December 2018
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation journal November 2015
Microbial species delineation using whole genome sequences journal July 2015
Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements journal June 2004
The human phylome journal June 2007
A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation journal May 2008
Brucella ceti and Brucella pinnipedialis genome characterization unveils genetic features that highlight their zoonotic potential journal October 2022
ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy journal September 2020
Genome sequence-based criteria for demarcation and definition of species in the genus Rickettsia journal March 2020
DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies journal June 2023
On the Best Evolutionary Rate for Phylogenetic Analysis journal March 1998
KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences journal February 2023
GenBank journal November 2012
Brucella Genomics: Macro and Micro Evolution journal October 2020
OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy journal August 2015
The neighbor-joining method: a new method for reconstructing phylogenetic trees. journal July 1987
Development and evaluation of a core genome multilocus sequence typing (cgMLST) scheme for Brucella spp. journal January 2019
Cells within cells: Rickettsiales and the obligate intracellular bacterial lifestyle journal February 2021
Building Phylogenetic Trees From Genome Sequences With kSNP4 journal November 2023
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences journal June 2005
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees journal May 2018
OrthoFinder: phylogenetic orthology inference for comparative genomics journal November 2019
Application of Whole Genome Sequencing and Pan-Family Multi-Locus Sequence Analysis to Characterize Relationships Within the Family Brucellaceae journal July 2020
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life book July 2019
Rickettsial genomics and the paradigm of genome reduction associated with increased virulence journal August 2018
PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments journal July 2006
GToTree: a user-friendly workflow for phylogenomics journal March 2019
Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life journal February 2020
Improved sensitivity of nucleic acid database searches using application-specific scoring matrices journal August 1991
Prokaryotic taxonomy and phylogeny in the genomic era: advancements and challenges ahead journal October 2007
UFBoot2: Improving the Ultrafast Bootstrap Approximation journal October 2017
Analysis of 1,000+ Type-Strain Genomes Substantially Improves Taxonomic Classification of Alphaproteobacteria journal April 2020
Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice journal March 2013
kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome: Table 1 journal April 2015
Incongruence between multi-locus sequence analysis (MLSA) and whole-genome-based phylogenies: Pseudomonas syringae pathovar pisi as a cautionary tale : Shared ancestry of journal January 2014
The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes journal November 2014
ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data journal February 2016
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Accelerated Profile HMM Searches journal October 2011