Loading [MathJax]/jax/output/HTML-CSS/jax.js

Linear Bandits on Uniformly Convex Sets

Thomas Kerdreux, Christophe Roux, Alexandre d'Aspremont, Sebastian Pokutta.

Year: 2021, Volume: 22, Issue: 284, Pages: 1−23


Abstract

Linear bandit algorithms yield ˜O(nT) pseudo-regret bounds on compact convex action sets KRn and two types of structural assumptions lead to better pseudo-regret bounds. When K is the simplex or an p ball with p]1,2], there exist bandits algorithms with ˜O(nT) pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond p balls that enjoy pseudo-regret bounds of ˜O(nT). This result provides new elements for the open question in Bubeck and Cesa-Bianchi, 2012. When the action set is q-uniformly convex but not necessarily strongly convex (q>2), we obtain pseudo-regret bounds ˜O(n1/qT1/p) with p s.t. 1/p+1/q=1. These pseudo-regret bounds are competitive with the general ˜O(nT) for a time horizon range that depends on the degree q>2 of the set's uniform convexity and the dimension n of the problem.

PDF BibTeX