| | |
Summary: Proceedlngs of 25th Conference
onDecisionandControl
Athens, Greece -December 1986 WA10 - 10:15
Asymptotically Efficient Rules in Multiarmed Bandit Problems
V. Anantharam and P. Varaiya
Department of Electrical Engineering and Computer Sciences
and Electronics Research Laboratory
University of California, Berkeley CA 94720
Setup
We are given N discrete-time real-valued stochastic processes
XI: X1(1). X'(2).' ' '
X N :X N ( l ) , X N ( 2 ) .' . ' .
. . .
The essential assumption is that these processes are independent. For
historical reasons these processes are also called m.
A fixed number r n , 1 dm < N , is specified. At each time t we
must select m different arms. Let T J(t be the number of times that
arm j wasselectedduringtheinterval 1. ' . .t : and let
U ( t ) C { l : . . , N ) bethem armsthatareselectedattimet. Then
at time t we receive the reward
|