Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Exploration-exploitation trade-off using variance estimates in multi-armed bandits

Summary: Exploration-exploitation trade-off using variance
estimates in multi-armed bandits
Jean Yves Audibert
Universit´e Paris-Est, Ecole des Ponts ParisTech, CERTIS
6 avenue Blaise Pascal, 77455 Marne-la-Vall´ee, France
Willow - ENS / INRIA
45 rue d'Ulm, 75005 Paris, France
R´emi Munos
INRIA Lille - Nord Europe, SequeL project,
40 avenue Halley, 59650 Villeneuve d'Ascq, France
Csaba Szepesv´ari,1
Department of Computing Science
University of Alberta
Edmonton T6G 2E8, Canada
Algorithms based on upper confidence bounds for balancing exploration and
exploitation are gaining popularity since they are easy to implement, efficient
and effective. This paper considers a variant of the basic algorithm for the
stochastic, multi-armed bandit problem that takes into account the empirical


Source: Audibert, Jean-Yves - Département d'Informatique, École Normale Supérieure


Collections: Computer Technologies and Information Sciences