 
Summary: The Center for Control, Dynamical Systems, and Computation
University of California at Santa Barbara
Spring 2009 Seminar Series
Presents
Qlearning and Pontryagin's Minimum Principle
Sean Meyn
University of Illinois at UrbanaChampaign
Friday, May 15, 2009 3:00  4:00pm WEBB 1100
Abstract: Qlearning is a technique used to compute an optimal policy for a controlled Markov chain based
on observations of the system controlled using a nonoptimal policy. It has proven to be effective for models
with finite state and action space. In this talk we demonstrate how the construction of the algorithm is identical
to concepts more classical nonlinear control theory  in particular, Jacobson & Mayne's differential dynamic pro
gramming introduced in the 1960's. I will show how Qlearning can be extended to deterministic and Markovian
systems in continuous time, with general state and action space. The main ideas are summarized as follows.
(i) Watkin's "Qfunction" is an extension of the Hamiltonian that appears in the Minimum Principle. Based on
this observation we obtain extensions of Watkin's algorithm to approximate the Hamiltonian within a prescribed
finitedimensional function class. (ii) A transformation of the optimality equations is performed based on the ad
joint of a resolvent operator. This is used to construct a consistent algorithm based on stochastic approximation
that requires only causal filtering of the timeseries data. (iii) Examples are presented to illustrate the application
of these techniques, including application to distributed control of multiagent systems. Reference: P. Mehta and
