Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Scaling InternalState PolicyGradient Methods for POMDPs Douglas Aberdeen douglas.aberdeen@anu.edu.au

Summary: Scaling Internal­State Policy­Gradient Methods for POMDPs
Douglas Aberdeen douglas.aberdeen@anu.edu.au
Research School of Information Science and Engineering, Australian Nat. University, ACT 0200, Australia
Jonathan Baxter jbaxter@panscient.com
Panscient Pty Ltd, Adelaide, Australia
Policy­gradient methods have received in­
creased attention recently as a mechanism
for learning to act in partially observable en­
vironments. They have shown promise for
problems admitting memoryless policies but
have been less successful when memory is re­
quired. In this paper we develop several im­
proved algorithms for learning policies with
memory in an infinite­horizon setting --- di­
rectly when a known model of the environ­
ment is available, and via simulation other­
wise. We compare these algorithms on some
large POMDPs, including noisy robot navi­
gation and multi­agent problems.


Source: Aberdeen, Douglas - National ICT Australia & Computer Sciences Laboratory, Australian National University


Collections: Computer Technologies and Information Sciences