| | |
Summary: Tuning bandit algorithms in stochastic
environments
Jean-Yves Audibert1
and R´emi Munos2
and Csaba Szepesv´ari3
1
CERTIS - Ecole des Ponts
19, rue Alfred Nobel - Cit´e Descartes
77455 Marne-la-Vall´ee - France
audibert@certis.enpc.fr
2
INRIA Futurs Lille, SequeL project,
50 avenue Halley, 59650 Villeneuve d'Ascq, France
remi.munos@inria.fr
3
University of Alberta, Edmonton T6G 2E8, Canada
szepesva@cs.ualberta.ca
Abstract. Algorithms based on upper-confidence bounds for balancing
exploration and exploitation are gaining popularity since they are easy
to implement, efficient and effective. In this paper we consider a variant
|