Contextual Multi-armed Bandits under Feature Uncertainty

Yun, Seyoung; Nam, Jun Hyun; Mo, Sangwoo; Shin, Jinwoo

doi:10.2172/1345927

Title: Contextual Multi-armed Bandits under Feature Uncertainty

Technical Report · Fri Mar 03 00:00:00 EST 2017

DOI:https://doi.org/10.2172/1345927· OSTI ID:1345927

Yun, Seyoung ^[1]; Nam, Jun Hyun ^[2]; Mo, Sangwoo ^[2]; Shin, Jinwoo ^[2]

Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)

We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined NLinRel, having O(T⁷/₈(log(dT)+K√d)) regret bound for T rounds, K actions, and d-dimensional feature vectors. Next, for the case of non-identical noise, we observe that popular linear hypotheses including NLinRel are impossible to achieve such sub-linear regret. Instead, under assumption of Gaussian feature vectors, we prove that a greedy algorithm has O(T²/₃√log d)regret bound with respect to the optimal linear hypothesis. Utilizing our theoretical understanding on the Gaussian case, we also design a practical variant of NLinRel, coined Universal-NLinRel, for arbitrary feature distributions. It first runs NLinRel for finding the ‘true’ coefficient vector using feature uncertainties and then adjust it to minimize its regret using the statistical feature information. We justify the performance of Universal-NLinRel on both synthetic and real-world datasets.

View Technical Report

Cite

Export

Save

Research Organization:: Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC52-06NA25396

OSTI ID:: 1345927

Report Number(s):: LA-UR-17-21865

Country of Publication:: United States

Language:: English

Similar Records

Output-weighted sampling for multi-armed bandits with extreme payoffs

Journal Article · Fri Apr 01 00:00:00 EDT 2022 · Proceedings of the Royal Society. A. Mathematical, Physical and Engineering Sciences · OSTI ID:1345927

Yang, Yibo; Blanchard, Antoine; Sapsis, Themistoklis; +1 more

Learning infinite-horizon average-reward restless multi-action bandits via index awareness

Conference · Mon Dec 05 00:00:00 EST 2022 · OSTI ID:1345927

Xiong, Guojun; Wang, Shufan; Li, Jian

Multi-armed bandit with sub-exponential rewards

Journal Article · Thu Aug 12 00:00:00 EDT 2021 · Operations Research Letters · OSTI ID:1345927

Jia, Huiwen; Shi, Cong; Shen, Siqian

Related Subjects

97 MATHEMATICS AND COMPUTING
Computer Science
Mathematics

Title: Contextual Multi-armed Bandits under Feature Uncertainty

Citation Formats

Similar Records

Related Subjects