Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Filtered Reinforcement Learning Douglas Aberdeen
 

Summary: Filtered Reinforcement Learning
Douglas Aberdeen
National ICT Australia, Canberra, Australia
douglas.aberdeen@nicta.com.au,
WWW home page: http://csl.anu.edu.au/~daa/
Abstract. Reinforcement learning (RL) algorithms attempt to assign
the credit for rewards to the actions that contributed to the reward.
Thus far, credit assignment has been done in one of two ways: uniformly,
or using a discounting model that assigns exponentially more credit to
recent actions. This paper demonstrates an alternative approach to tem­
poral credit assignment, taking advantage of exact or approximate prior
information about correct credit assignment. Infinite impulse response
(IIR) filters are used to model credit assignment information. IIR filters
generalise exponentially discounting eligibility traces to arbitrary credit
assignment models. This approach can be applied to any RL algorithm
that employs an eligibility trace. The use of IIR credit assignment filters
is explored using both the GPOMDP policy­gradient algorithm and the
Sarsa(#) temporal­di#erence algorithm. A drop in bias and variance of
value or gradient estimates is demonstrated, resulting in faster conver­
gence to better policies.

  

Source: Aberdeen, Douglas - National ICT Australia & Computer Sciences Laboratory, Australian National University

 

Collections: Computer Technologies and Information Sciences