Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Deviations of stochastic bandit regret Antoine Salomon1
 

Summary: Deviations of stochastic bandit regret
Antoine Salomon1
and Jean-Yves Audibert1,2
1
Imagine
´Ecole des Ponts ParisTech
Universit´e Paris Est
salomona@imagine.enpc.fr
audibert@imagine.enpc.fr
2
Sierra, CNRS/ENS/INRIA, Paris, France
Abstract. This paper studies the deviations of the regret in a stochastic
multi-armed bandit problem. When the total number of plays n is known
beforehand by the agent, Audibert et al. (2009) exhibit a policy such
that with probability at least 1 - 1/n, the regret of the policy is of order
log n. They have also shown that such a property is not shared by the
popular ucb1 policy of Auer et al. (2002). This work first answers an
open question: it extends this negative result to any anytime policy. The
second contribution of this paper is to design anytime robust policies for
specific multi-armed bandit problems in which some restrictions are put

  

Source: Audibert, Jean-Yves - Département d'Informatique, École Normale Supérieure

 

Collections: Computer Technologies and Information Sciences