Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Scaling Internal-State Policy-Gradient Methods for POMDPs Douglas Aberdeen douglas.aberdeen@anu.edu.au
 

Summary: Scaling Internal-State Policy-Gradient Methods for POMDPs
Douglas Aberdeen douglas.aberdeen@anu.edu.au
Research School of Information Science and Engineering, Australian Nat. University, ACT 0200, Australia
Jonathan Baxter jbaxter@panscient.com
Panscient Pty Ltd, Adelaide, Australia
Abstract
Policy-gradient methods have received in-
creased attention recently as a mechanism
for learning to act in partially observable en-
vironments. They have shown promise for
problems admitting memoryless policies but
have been less successful when memory is re-
quired. In this paper we develop several im-
proved algorithms for learning policies with
memory in an infinite-horizon setting -- di-
rectly when a known model of the environ-
ment is available, and via simulation other-
wise. We compare these algorithms on some
large POMDPs, including noisy robot navi-
gation and multi-agent problems.

  

Source: Aberdeen, Douglas - National ICT Australia & Computer Sciences Laboratory, Australian National University

 

Collections: Computer Technologies and Information Sciences