Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

An investigation of irreproducibility in maximum likelihood phylogenetic inference

Journal Article · · Nature Communications

Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type).

Research Organization:
Great Lakes Bioenergy Research Center, Madison, WI (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
SC0018409
OSTI ID:
1764993
Journal Information:
Nature Communications, Journal Name: Nature Communications Journal Issue: 1 Vol. 11; ISSN 2041-1723
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (66)

Reproducibility in Chemical Research journal August 2016
Comparison of phylogenetic trees journal February 1981
Comparison of labeled trees with valency three journal October 1971
Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum journal November 2018
Horizontal Transfer of a Large and Highly Toxic Secondary Metabolic Gene Cluster between Fungi journal January 2011
Irreproducibility in Preclinical Biomedical Research: Perceptions, Uncertainties, and Knowledge Gaps journal April 2016
An Integrated Perspective on Phylogenetic Workflows journal February 2016
Transparency in Ecology and Evolution: Real Problems, Real Solutions journal September 2016
Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms journal October 2015
Data archiving is a good investment journal May 2011
Missing data mean holes in tree of life journal January 2013
1,500 scientists lift the lid on reproducibility journal May 2016
The case for open computer programs journal February 2012
Mutational heterogeneity in cancer and the search for new cancer-associated genes journal June 2013
A new view of the tree of life journal April 2016
Explosive diversification of marine fishes at the Cretaceous–Palaeogene boundary journal March 2018
Releasing uncurated datasets is essential for reproducible phylogenomics journal September 2020
Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 journal August 2018
The evolution of language families is shaped by the environment beyond neutral drift journal November 2018
Open is not enough journal November 2018
Ancient hepatitis B viruses from the Bronze Age to the Medieval period journal May 2018
One thousand plant transcriptomes and the phylogenomics of green plants journal October 2019
An Approximately Unbiased Test of Phylogenetic Tree Selection journal May 2002
Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees journal January 1997
phangorn: phylogenetic analysis in R journal December 2010
Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees journal September 2012
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies journal January 2014
ASTRAL: genome-scale coalescent-based species tree estimation journal August 2014
ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R journal July 2018
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference journal May 2019
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies journal November 2014
Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets journal November 2017
MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms journal May 2018
ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models journal August 2019
The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets journal February 2020
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0 journal March 2010
Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales journal January 2012
Anchored Hybrid Enrichment for Massively High-Throughput Phylogenomics journal July 2012
The Phylogenetic Likelihood Library journal October 2014
Impacts of Terraces on Phylogenetic Inference journal May 2015
Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures journal October 2008
Exon capture phylogenomics: efficacy across scales of divergence journal August 2015
Dissecting the basis of novel trait evolution in a radiation with widespread phylogenetic discordance journal July 2018
Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement journal January 2009
Genome-Wide Comparison of Medieval and Modern Mycobacterium leprae journal June 2013
Reproducibility journal January 2014
Estimating the reproducibility of psychological science journal August 2015
Evaluating replicability of laboratory experiments in economics journal March 2016
Reproducibility in Scientific Computing journal July 2018
State-of the art methodologies dictate new standards for phylogenetic analysis journal January 2013
Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis journal October 2012
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees journal May 2018
OrthoFinder: phylogenetic orthology inference for comparative genomics journal November 2019
Lost Branches on the Tree of Life journal September 2013
Troubleshooting Public Data Archiving: Suggestions to Increase Participation journal January 2014
Public Data Archiving in Ecology and Evolution: How Well Are We Doing? journal November 2015
Extensive loss of cell-cycle and DNA repair genes in an ancient lineage of bipolar budding yeasts journal May 2019
Why Most Published Research Findings Are False journal August 2005
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
A study of the impact of data sharing on article citations using journal policies as a natural experiment journal December 2019
Molecular Systematics journal December 1996
Pitfalls in supermatrix phylogenomics journal February 2017
An investigation of irreproducibility in maximum likelihood phylogenetic inference dataset January 2020
Public data archiving in ecology and evolution: how well are we doing? dataset January 2015
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees collection January 2018
A study of the impact of data sharing on article citations using journal policies as a natural experiment dataset January 2020

Similar Records

The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics
Journal Article · Fri Jun 28 00:00:00 EDT 2024 · Systematic Biology · OSTI ID:2406411