skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Contextual Multi-armed Bandits under Feature Uncertainty

Technical Report ·
DOI:https://doi.org/10.2172/1345927· OSTI ID:1345927
 [1];  [2];  [2];  [2]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)

We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined NLinRel, having O(T⁷/₈(log(dT)+K√d)) regret bound for T rounds, K actions, and d-dimensional feature vectors. Next, for the case of non-identical noise, we observe that popular linear hypotheses including NLinRel are impossible to achieve such sub-linear regret. Instead, under assumption of Gaussian feature vectors, we prove that a greedy algorithm has O(T²/₃√log d)regret bound with respect to the optimal linear hypothesis. Utilizing our theoretical understanding on the Gaussian case, we also design a practical variant of NLinRel, coined Universal-NLinRel, for arbitrary feature distributions. It first runs NLinRel for finding the ‘true’ coefficient vector using feature uncertainties and then adjust it to minimize its regret using the statistical feature information. We justify the performance of Universal-NLinRel on both synthetic and real-world datasets.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC52-06NA25396
OSTI ID:
1345927
Report Number(s):
LA-UR-17-21865
Country of Publication:
United States
Language:
English

Similar Records

Output-weighted sampling for multi-armed bandits with extreme payoffs
Journal Article · Fri Apr 01 00:00:00 EDT 2022 · Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences · OSTI ID:1345927

Learning infinite-horizon average-reward restless multi-action bandits via index awareness
Conference · Mon Dec 05 00:00:00 EST 2022 · OSTI ID:1345927

Multi-armed bandit with sub-exponential rewards
Journal Article · Thu Aug 12 00:00:00 EDT 2021 · Operations Research Letters · OSTI ID:1345927