Multi-armed Bandit Problems with History

Pannagadatta Shivaswamy, Thorsten Joachims ; JMLR W&CP 22: 1046-1054, 2012.

Abstract

In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.




Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed

Page last modified on Thu April 26 2012 13:56 2012.

Copyright @ JMLR 2012. All rights reserved.