Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder, Jean–Yves Audibert, Manfred Opper, John Shawe–Taylor ; JMLR W&CP 9:273-280, 2010.

Abstract

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.



Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed

Page last modified on Wed Mar 24 15:36 GMT 2010.

Copyright @ JMLR 2000. All rights reserved.