Linear Bandits on Uniformly Convex Sets

Thomas Kerdreux, Christophe Roux, Alexandre d'Aspremont, Sebastian Pokutta.

Year: 2021, Volume: 22, Issue: 284, Pages: 1−23


Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$. This result provides new elements for the open question in Bubeck and Cesa-Bianchi, 2012. When the action set is $q$-uniformly convex but not necessarily strongly convex ($q >2$), we obtain pseudo-regret bounds $\tilde{\mathcal{O}}(n^{1/q}T^{1/p})$ with $p$ s.t. $1/p + 1/q=1$. These pseudo-regret bounds are competitive with the general $\tilde{\mathcal{O}}(n\sqrt{T})$ for a time horizon range that depends on the degree $q>2$ of the set's uniform convexity and the dimension $n$ of the problem.