Linear Bandits on Uniformly Convex Sets
Thomas Kerdreux, Christophe Roux, Alexandre d'Aspremont, Sebastian Pokutta; 22(284):1−23, 2021.
Abstract
Linear bandit algorithms yield ˜O(n√T) pseudo-regret bounds on compact convex action sets K⊂Rn and two types of structural assumptions lead to better pseudo-regret bounds. When K is the simplex or an ℓp ball with p∈]1,2], there exist bandits algorithms with ˜O(√nT) pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond ℓp balls that enjoy pseudo-regret bounds of ˜O(√nT). This result provides new elements for the open question in Bubeck and Cesa-Bianchi, 2012. When the action set is q-uniformly convex but not necessarily strongly convex (q>2), we obtain pseudo-regret bounds ˜O(n1/qT1/p) with p s.t. 1/p+1/q=1. These pseudo-regret bounds are competitive with the general ˜O(n√T) for a time horizon range that depends on the degree q>2 of the set's uniform convexity and the dimension n of the problem.
© JMLR 2021. (edit, beta) |