 
Summary: Identifying evolutionary trees and
substitution parameters for the general
Markov model with invariable sites
Elizabeth S. Allman, John A. Rhodes
Department of Mathematics and Statistics
University of Alaska Fairbanks
PO Box 756660
Fairbanks, AK 99775
Abstract
The general Markov plus invariable sites (GM+I) model of biological sequence evo
lution is a twoclass model in which an unknown proportion of sites are not allowed
to change, while the remainder undergo substitutions according to a Markov pro
cess on a tree. For statistical use it is important to know if the model is identifiable;
can both the tree topology and the numerical parameters be determined from a
joint distribution describing sequences only at the leaves of the tree? We establish
that for generic parameters both the tree and all numerical parameter values can
be recovered, up to clearly understood issues of `label swapping.' The method of
analysis is algebraic, using phylogenetic invariants to study the variety defined by
the model. Simple rational formulas, expressed in terms of determinantal ratios, are
found for recovering numerical parameters describing the invariable sites.
