Processing math: 100%



Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Multi-Agent Multi-Armed Bandits with Limited Communication

Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli; 23(212):1−24, 2022.

Abstract

We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for KN. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of ˜O((K/N+N)T), communicates for O(logT) steps and broadcasts O(logK) bits in each communication step. We extend the work to sparse graphs with maximum degree KG and diameter D to propose LCC-UCB-GRAPH which enjoys a regret bound of ˜O(D(K/N+KG)DT). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node.

[abs][pdf][bib]       
© JMLR 2022. (edit, beta)

Mastodon