CS680: Assignment SBQ3 Due February 2nd. For SURGE, due February 16th Summary: CS680: Assignment SBQ3 Due February 2nd. For SURGE, due February 16th Question 1.1 Do Exercise 3.1 in Sutton and Barto's draft. Question 1.2 Do Exercise 3.2. Question 1.3 Do Exercise 3.4. Question 1.4 Prove that Equation (3.4) follows from Equation (3.3) and the definition ff(a) = 1=n t+1 (a). Question 1.5 Prove that the probability update rule in Equation (3.7) maintains the sum of the probabilities to be one. Does this depend on the value of fi? Does this depend on the initial probabilities? Question 1.6 To gain an intuitive ``feel'' for the Boltzmann distribution, let's plot it. Assume you are dealing with an MDP (What does this stand for?) consisting of one state and two actions, a 1 and a 2 . Since we are using the Boltzmann distribution to balance exploration with exploitation, the probability of selecting action a 1 is p(a 1 ) = e Q(a 1 )=T e Q(a 1 )=T + e Q(a 2 )=T : Set Q(a 2 ) = 1 and T = 1. Plot p(a 1 ) versus Q(a 1 ) as Q(a 1 ) ranges from \Gamma5 to 5. At what value of Q(a 1 ) does p(a 1 ) = 0:5? Why? Explain in words what this plot shows. Draw two more plots of p(a 1 ) using two different values of T . Choose one value of T that results in an almost constant p(a 1 ) equal to 0.5, meaning that each action will be randomly selected equally often. Collections: Computer Technologies and Information Sciences