Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

The Center for Control, Dynamical Systems, and Computation University of California at Santa Barbara

Summary: The Center for Control, Dynamical Systems, and Computation
University of California at Santa Barbara
Spring 2009 Seminar Series
Q-learning and Pontryagin's Minimum Principle
Sean Meyn
University of Illinois at Urbana-Champaign
Friday, May 15, 2009 3:00 - 4:00pm WEBB 1100
Abstract: Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based
on observations of the system controlled using a non-optimal policy. It has proven to be effective for models
with finite state and action space. In this talk we demonstrate how the construction of the algorithm is identical
to concepts more classical nonlinear control theory - in particular, Jacobson & Mayne's differential dynamic pro-
gramming introduced in the 1960's. I will show how Q-learning can be extended to deterministic and Markovian
systems in continuous time, with general state and action space. The main ideas are summarized as follows.
(i) Watkin's "Q-function" is an extension of the Hamiltonian that appears in the Minimum Principle. Based on
this observation we obtain extensions of Watkin's algorithm to approximate the Hamiltonian within a prescribed
finite-dimensional function class. (ii) A transformation of the optimality equations is performed based on the ad-
joint of a resolvent operator. This is used to construct a consistent algorithm based on stochastic approximation
that requires only causal filtering of the time-series data. (iii) Examples are presented to illustrate the application
of these techniques, including application to distributed control of multi-agent systems. Reference: P. Mehta and


Source: Akhmedov, Azer - Department of Mathematics, University of California at Santa Barbara


Collections: Mathematics