 
Summary: Calculating Likelihoods on Phylogenetic trees
John P. Huelsenbeck
March 27, 2010
1 Assumptions of phylogenetic methods
The models used in phylogenetic analysis of molecular data have three components. First, they assume a
tree relating the samples. Here, the samples might be DNA sequences collected from different species, or
different individuals within a population. In either case, a basic assumption is that the samples are related to
one another through an (unknown) tree. This would be a species tree for sequences sampled from different
species, or perhaps a coalescence tree for sequences sampled from individuals from within a population.
Second, they assume that the branches of the tree have an (unknown) length. Ideally, the length of a branch
on a tree is in terms of time. However, in practice it is difficult to determine the duration of a branch on a
tree in terms of time. Instead, the lengths of the branches on the tree are in terms of expected change per
character. Figure 1 shows some examples of trees with branch lengths. The main points the reader should
remember are: (1) Trees can be rooted or unrooted. Rooted trees have a time direction whereas unrooted
trees do not. Most methods of phylogenetic inference, including most implementations of maximum likelihood
and Bayesian analysis, are based on timereversible models of evolution that produce unrooted trees, which
must be rooted using some other criterion, such as the outgroup criterion (using distantly related reference
sequences to locate the root). (2) The space of possible trees is huge. The number of possible unrooted
trees for n species is B(n) = (2n5)!
2n3(n3)! (Schr¨oder, 1870). This means that for a relatively small problem
