| | |
Summary: Scaling InternalState PolicyGradient Methods for POMDPs
Douglas Aberdeen douglas.aberdeen@anu.edu.au
Research School of Information Science and Engineering, Australian Nat. University, ACT 0200, Australia
Jonathan Baxter jbaxter@panscient.com
Panscient Pty Ltd, Adelaide, Australia
Abstract
Policygradient methods have received in
creased attention recently as a mechanism
for learning to act in partially observable en
vironments. They have shown promise for
problems admitting memoryless policies but
have been less successful when memory is re
quired. In this paper we develop several im
proved algorithms for learning policies with
memory in an infinitehorizon setting --- di
rectly when a known model of the environ
ment is available, and via simulation other
wise. We compare these algorithms on some
large POMDPs, including noisy robot navi
gation and multiagent problems.
|