| | |
Summary: 1
Summary of Notation
Notation for book draft by Sutton and Barto, with additions by David Peter
son.
t = 0; 1; 2; : : : discrete time step
Basic Random Variables
s t 2 S state at time t
a t 2 A(s t ) action at time t
r t 2 ! reward at time t, due, like s t , to s t\Gamma1 and a t\Gamma1
R t 2 ! return following time t (Section 2.5)
Timeless Environmental Quantities
p a
s;s 0
probability of transition from state s to state s 0 under action a
ae(s; a) expected immediate reward from state s after taking action a
ß a policy
ß(s; a) probability of taking action a in state s under policy ß
V ß (s) value of state s under policy ß (expected return)
V \Lambda (s) value of state s under the optimal policy
Q ß (s; a) value of taking action a in state s under policy ß
|