| | |
Summary: Syst. Biol. 54(1):146157, 2005
Copyright c Society of Systematic Biologists
ISSN: 1063-5157 print / 1076-836X online
DOI: 10.1080/10635150590905984
Missing the Forest for the Trees: Phylogenetic Compression and Its Implications
for Inferring Complex Evolutionary Histories
C´ECILE AN´E1,2
AND MICHAEL J. SANDERSON1
1
Section of Evolution and Ecology, University of California, Davis, California 95616, USA; E-mail: mjsanderson@ucdavis.edu (M.J.S.)
2
Current address: Department of Statistics, University of WisconsinMadison, Medical Science Center, 1300 University Ave.,
Madison, WI 53 706, USA; E-mail: ane@cs.wisc.edu
Abstract.-- Phylogenetic tree reconstruction is difficult in the presence of lateral gene transfer and other processes generating
conflicting signals. We develop a new approach to this problem using ideas borrowed from algorithmic information theory.
It selects the hypothesis that simultaneously minimizes the descriptive complexity of the tree(s) plus the data when encoded
using those tree(s). In practice this is the hypothesis that can compress the data the most. We show not only that phylogenetic
compression is an efficient method for encoding most phylogenetic data sets and is more efficient than compression schemes
designed for single sequences, but also that it provides a clear information theoretic rule for determining when a collection of
conflicting trees is a better explanation of the data than a single tree. By casting the parsimony problem in this more general
|