Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Best Arm Identification in Multi-Armed Bandits Jean-Yves Audibert

Summary: Best Arm Identification in Multi-Armed Bandits
Jean-Yves Audibert
Imagine, Universit´e Paris Est
Willow, CNRS/ENS/INRIA, Paris, France
S´ebastien Bubeck, R´emi Munos
SequeL Project, INRIA Lille
40 avenue Halley,
59650 Villeneuve d'Ascq, France
{sebastien.bubeck, remi.munos}@inria.fr
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The
regret of a forecaster is here defined by the gap between the mean reward of the optimal arm
and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy
and a new algorithm based on successive rejects. We show that these algorithms are essentially
optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the
best possible. However, while the UCB policy needs the tuning of a parameter depending on the
unobservable hardness of the task, the successive rejects policy benefits from being parameter-free,
and also independent of the scaling of the rewards. As a by-product of our analysis, we show


Source: Audibert, Jean-Yves - Département d'Informatique, École Normale Supérieure


Collections: Computer Technologies and Information Sciences