 
Summary: Exercises
November 11, 2003
1. The Markov matrices that describe real DNA mutation tend to have their largest entries along the
main diagonal in the (1,1), (2,2), (3,3), and (4,4) positions. Why should this be the case?
2. An ancestral DNA sequence of 40 bases was
CTAGGCTTACGATTACGAGGATCCAAATGGCACCAATGCT,
but in a descendent it had mutated to
CTACGCTTACGACAACGAGGATCCGAATGGCACCATTGCT.
a. Compute the JC distance between the sequences.
b. Give an initial base distribution vector and a Markov matrix to describe the mutation process.
c. These sequences were actually produced by a JukesCantor simulation. Is that surprising? Explain.
3. Data from two comparisons of 400base ancestral and descendent sequences are shown below in tables.
S1 S0 A G C T
A 92 15 2 2
G 13 84 4 4
C 0 1 77 16
T 4 2 14 70
S1 S0 A G C T
A 90 3 3 2
G 3 79 8 2
