Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
968 IEEE TRANSACTIONSON AUTOMATIC CONTROL, VOL.AC-32.NO. 11, NOVEMl3ER 1987 Asymptotically Efficient Allocation Rules for the
 

Summary: 968 IEEE TRANSACTIONSON AUTOMATIC CONTROL, VOL.AC-32.NO. 11, NOVEMl3ER 1987
Asymptotically Efficient Allocation Rules for the
Multiarmed Bandit Problem with Multiple
Plays-Part I: I.I.D. Rewards
Absrract-At each instant of time we are required to sample a fixed
number rn 2 1 out of N i.i.d. processes whose distributions belong to a
family suitably parameterized by a real number 8. Theobjective is to
maximize the long run total expected value of the samples. Following Lai
and Robbins, the learning loss of a sampling scheme corresponding to a
configuration of parameters C = (e,, ..a , e,\,) is quantified by the regret
Rn(o.This is the difference between the maximum expected reward at
time n that could be achieved if C were known and the expected reward
actually obtained by the sampling scheme. We provide a lower bound for
the regret associated with any uniformly good scheme, and construct a
scheme which attains the lowerboundfor every configuration C. The
lower bound is given explicitly in terms of the Kullback-Liebler number
between pairs of distributions. Part I1 of this paper considers the same
problem when the reward processes are Markovian.
I. INTRODUCTTON
IN this paper we study a version of the multiarmed bandit

  

Source: Anantharam, Venkat - Department of Electrical Engineering and Computer Sciences, University of California at Berkeley

 

Collections: Engineering