Contextual Bandits with Linear Payoff Functions

Wei Chu, Lihong Li, Lev Reyzin, Robert Schapire; JMLR W&CP 15:208-214, 2011.

Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and $d$ dimensional feature vectors, we prove an $O(\sqrt{Td\ln^3(KT\ln(T)/\delta)})$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.

[pdf]



Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed