Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Journal of Machine Learning Research ? (2002) ?? Submitted 4/02; Published ??/?? Internal-State Policy-Gradient Algorithms for Partially
 

Summary: Journal of Machine Learning Research ? (2002) ?? Submitted 4/02; Published ??/??
Internal-State Policy-Gradient Algorithms for Partially
Observable Markov Decision Processes
Douglas Aberdeen douglas.aberdeen@anu.edu.au
Research School of Information Science and Engineering
Building 115, Daley Rd
Australian National University
A.C.T, 0200, AUSTRALIA
Jonanthan Baxter jonathannebaxter@yahoo.com
Panscient Pty Ltd
Adelaide, Australia
Editor: ??
Abstract
Policy-gradient algorithms are attractive as a scalable approach to learning approximate
policies for controlling partially observable Markov decision processes (POMDPs). POMDPs
can be used to model a wide variety of learning problems, from robot navigation to speech
recognition to stock trading. The downside of this generality is that exact algorithms are
computationally intractable, motivating approximate methods. Existing policy-gradient
methods have worked well for problems admitting good memory-less solutions, but have
failed to scale to large problems requiring memory. This paper develops novel algorithms

  

Source: Aberdeen, Douglas - National ICT Australia & Computer Sciences Laboratory, Australian National University

 

Collections: Computer Technologies and Information Sciences