Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
CS680: Assignment SBQ3 Due February 2nd. For SURGE, due February 16th
 

Summary: CS680: Assignment SBQ3
Due February 2nd. For SURGE, due February 16th
Question 1.1 Do Exercise 3.1 in Sutton and Barto's draft.
Question 1.2 Do Exercise 3.2.
Question 1.3 Do Exercise 3.4.
Question 1.4 Prove that Equation (3.4) follows from Equation (3.3) and the definition ff(a) =
1=n t+1 (a).
Question 1.5 Prove that the probability update rule in Equation (3.7) maintains the sum of
the probabilities to be one. Does this depend on the value of fi? Does this depend on the initial
probabilities?
Question 1.6 To gain an intuitive ``feel'' for the Boltzmann distribution, let's plot it. Assume
you are dealing with an MDP (What does this stand for?) consisting of one state and two
actions, a 1 and a 2 . Since we are using the Boltzmann distribution to balance exploration with
exploitation, the probability of selecting action a 1 is
p(a 1 ) = e Q(a 1 )=T
e Q(a 1 )=T + e Q(a 2 )=T :
Set Q(a 2 ) = 1 and T = 1. Plot p(a 1 ) versus Q(a 1 ) as Q(a 1 ) ranges from \Gamma5 to 5. At what
value of Q(a 1 ) does p(a 1 ) = 0:5? Why? Explain in words what this plot shows. Draw two more
plots of p(a 1 ) using two different values of T . Choose one value of T that results in an almost
constant p(a 1 ) equal to 0.5, meaning that each action will be randomly selected equally often.

  

Source: Anderson, Charles W. - Department of Computer Science, Colorado State University

 

Collections: Computer Technologies and Information Sciences