| | |
Summary: Policy-Gradient Algorithms for
Partially Observable Markov
Decision Processes
Douglas Alexander Aberdeen
A thesis submitted for the degree of
Doctor of Philosophy at
The Australian National University
April 2003
c Douglas Alexander Aberdeen
Typeset in Computer Modern by TEX and LATEX 2.
Except where otherwise indicated, this thesis is my own original work.
Douglas Alexander Aberdeen
25 April 2003
Acknowledgements
Academic
Primary thanks go to Jonathan Baxter, my main advisor, who kept up his supervision
despite going to work in the "real world." The remainder of my panel was Sylvie
ThiŽebaux, Peter Bartlett, and Bruce Millar, all of whom gave invaluable advice. Thanks
also to Bob Edwards for constructing the "Bunyip" Linux cluster and co-authoring the
paper that is the basis of Chapter 11.
|