Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Near-optimal Regret Bounds for Reinforcement Learning Peter Auer Thomas Jaksch Ronald Ortner
 

Summary: Near-optimal Regret Bounds for Reinforcement Learning
Peter Auer Thomas Jaksch Ronald Ortner
University of Leoben, Franz-Josef-Strasse 18,
8700 Leoben, Austria
{auer,tjaksch,rortner}@unileoben.ac.at
December 19, 2007
Abstract
For undiscounted reinforcement learning we consider the total regret of a learning algorithm
in respect to an optimal policy. We present a reinforcement learning algorithm with total regret
~O DS

AT after T steps for any unknown Markov decision process (MDP) with S states,
A actions per state, and diameter D. The diameter of an MDP is at most D if for any pair
of states s1, s2 there is a policy which moves from s1 to s2 in at most D steps (on average).
Our upper bound holds with high probability and it can be converted into a logarithmic regret
bound, if a xed dierence between the average reward of the optimal and the second optimal
policy is assumed.
We also present a corresponding lower bound

DSAT on the worst case total regret of

  

Source: Auer, Peter - Department Mathematik und Informationstechnologie, Montanuniversitšt Leoben

 

Collections: Computer Technologies and Information Sciences