 
Summary: CS680: Assignment SBQ3
Due February 2nd. For SURGE, due February 16th
Question 1.1 Do Exercise 3.1 in Sutton and Barto's draft.
Question 1.2 Do Exercise 3.2.
Question 1.3 Do Exercise 3.4.
Question 1.4 Prove that Equation (3.4) follows from Equation (3.3) and the definition ff(a) =
1=n t+1 (a).
Question 1.5 Prove that the probability update rule in Equation (3.7) maintains the sum of
the probabilities to be one. Does this depend on the value of fi? Does this depend on the initial
probabilities?
Question 1.6 To gain an intuitive ``feel'' for the Boltzmann distribution, let's plot it. Assume
you are dealing with an MDP (What does this stand for?) consisting of one state and two
actions, a 1 and a 2 . Since we are using the Boltzmann distribution to balance exploration with
exploitation, the probability of selecting action a 1 is
p(a 1 ) = e Q(a 1 )=T
e Q(a 1 )=T + e Q(a 2 )=T :
Set Q(a 2 ) = 1 and T = 1. Plot p(a 1 ) versus Q(a 1 ) as Q(a 1 ) ranges from \Gamma5 to 5. At what
value of Q(a 1 ) does p(a 1 ) = 0:5? Why? Explain in words what this plot shows. Draw two more
plots of p(a 1 ) using two different values of T . Choose one value of T that results in an almost
constant p(a 1 ) equal to 0.5, meaning that each action will be randomly selected equally often.
