Contextual Multi-armed Bandits under Feature Uncertainty
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)
We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined NLinRel, having O(T⁷/₈(log(dT)+K√d)) regret bound for T rounds, K actions, and d-dimensional feature vectors. Next, for the case of non-identical noise, we observe that popular linear hypotheses including NLinRel are impossible to achieve such sub-linear regret. Instead, under assumption of Gaussian feature vectors, we prove that a greedy algorithm has O(T²/₃√log d)regret bound with respect to the optimal linear hypothesis. Utilizing our theoretical understanding on the Gaussian case, we also design a practical variant of NLinRel, coined Universal-NLinRel, for arbitrary feature distributions. It first runs NLinRel for finding the ‘true’ coefficient vector using feature uncertainties and then adjust it to minimize its regret using the statistical feature information. We justify the performance of Universal-NLinRel on both synthetic and real-world datasets.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1345927
- Report Number(s):
- LA-UR-17-21865
- Country of Publication:
- United States
- Language:
- English
Similar Records
Learning infinite-horizon average-reward restless multi-action bandits via index awareness
Multi-armed bandit with sub-exponential rewards