| | |
Summary: Homework 3
Due November 4, Thursday
1. Consider a model with S = {s1, s2}, As1 = {a11, a12}, As2 = {a21, a22, a23},
p{s1|s1, a11} = 1, p{s1|s1, a12} = 0.5, p{s1|s2, a21} = 1, p{s1|s2, a22} = 0
and p{s1|s2, a23} = 0.75.
a. Determine the chain structure of each deterministic stationary policy.
b. Compute the long average optimal gain. What is the long run average
optimal policy?
2. Show that if all stationary deterministic policies are unichain, then all
stationary randomized policies are unichain.
3. A decision maker observes a discrete time system which moves between
states {s1, s2, s3, s4} according to the following transition probability matrix:
P =
0.3 0.4 0.2 0.1
|