DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Novel symmetry-preserving neural network model for phylogenetic inference

Journal Article · · Bioinformatics Advances

Abstract Motivation Scientists world-wide are putting together massive efforts to understand how the biodiversity that we see on Earth evolved from single-cell organisms at the origin of life and this diversification process is represented through the Tree of Life. Low sampling rates and high heterogeneity in the rate of evolution across sites and lineages produce a phenomenon denoted “long branch attraction” (LBA) in which long nonsister lineages are estimated to be sisters regardless of their true evolutionary relationship. LBA has been a pervasive problem in phylogenetic inference affecting different types of methodologies from distance-based to likelihood-based. Results Here, we present a novel neural network model that outperforms standard phylogenetic methods and other neural network implementations under LBA settings. Furthermore, unlike existing neural network models in phylogenetics, our model naturally accounts for the tree isomorphisms via permutation invariant functions which ultimately result in lower memory and allows the seamless extension to larger trees. Availability and implementation We implement our novel theory on an open-source publicly available GitHub repository: https://github.com/crsl4/nn-phylogenetics.

Research Organization:
University of Wisconsin, Madison, WI (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0021016
OSTI ID:
2338226
Journal Information:
Bioinformatics Advances, Journal Name: Bioinformatics Advances Journal Issue: 1 Vol. 4; ISSN 2635-0041
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (28)

Global Biodiversity book January 1992
Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA journal November 2004
Can Incomplete Taxa Rescue Phylogenetic Analyses from Long-Branch Attraction? journal October 2005
Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling journal January 2022
MRBAYES: Bayesian inference of phylogenetic trees journal August 2001
MrBayes 3: Bayesian phylogenetic inference under mixed models journal August 2003
phangorn: phylogenetic analysis in R journal December 2010
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies journal January 2014
ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R journal July 2018
Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks journal July 2020
A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process journal June 2004
PAML 4: Phylogenetic Analysis by Maximum Likelihood journal April 2007
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies journal November 2014
Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies journal December 2019
Models of amino acid substitution and applications to mitochondrial protein evolution journal December 1998
Cases in which Parsimony or Compatibility Methods will be Positively Misleading journal December 1978
Success of Phylogenetic Methods in the Four-Taxon Case journal September 1993
Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning journal September 2019
Amniote Phylogeny and the Importance of Fossils journal June 1988
A review of long-branch attraction journal April 2005
Long Short-Term Memory journal November 1997
FLU, an amino acid substitution model for influenza proteins journal April 2010
Heterotachy and long-branch attraction in phylogenetics journal October 2005
Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model journal January 2007
Efficient learning of non-autoregressive graph variational autoencoders for molecular graph generation journal November 2019
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models journal February 2021
Cases in which Parsimony or Compatibility Methods Will be Positively Misleading journal December 1978
A Framework for the Quantitative Study of Evolutionary Trees journal December 1989