Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed

Better Rates for Any Adversarial Deterministic MDP

Ofer Dekel, Elad Hazan
;
JMLR W&CP 28 (3) : 675–683, 2013

Abstract

We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of \(O(T^{2/3})\) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was \(O(T^{3/4})\). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.

Related Material