Processing math: 100%



Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

No Weighted-Regret Learning in Adversarial Bandits with Delays

Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet; 23(139):1−43, 2022.

Abstract

Consider a scenario where a player chooses an action in each round t out of T rounds and observes the incurred cost after a delay of dt rounds. The cost functions and the delay sequence are chosen by an adversary. We show that in a non-cooperative game, the expected weighted ergodic distribution of play converges to the set of coarse correlated equilibria if players use algorithms that have “no weighted-regret” in the above scenario, even if they have linear regret due to too large delays. For a two-player zero-sum game, we show that no weighted-regret is sufficient for the weighted ergodic average of play to converge to the set of Nash equilibria. We prove that the FKM algorithm with n dimensions achieves an expected regret of O(nT34+nT13D13) and the EXP3 algorithm with K arms achieves an expected regret of O(logK(KT+D)) even when D=Tt=1dt and T are unknown. These bounds use a novel doubling trick that, under mild assumptions, provably retains the regret bound for when D and T are known. Using these bounds, we show that FKM and EXP3 have no weighted-regret even for dt=O(tlogt). Therefore, algorithms with no weighted-regret can be used to approximate a CCE of a finite or convex unknown game that can only be simulated with bandit feedback, even if the simulation involves significant delays.

[abs][pdf][bib]       
© JMLR 2022. (edit, beta)

Mastodon