Home Page




Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)




Frequently Asked Questions

Contact Us

RSS Feed

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Eric Xia, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan; 24(392):1−43, 2023.


Reinforcement learning algorithms are known to exhibit a variety of convergence rates depending on the problem structure. Recent years have witnessed considerable progress in developing theory that is instance-dependent, along with algorithms that achieve such instance-optimal guarantees. However, important questions remain in how to utilize such notions for inferential purposes, or for early stopping, so that data and computational resources can be saved for “easy” problems. This paper develops data-dependent procedures that output instance-dependent confidence regions for evaluating and optimizing policies in a Markov decision process. Notably, our procedures require only black-box access to an instance-optimal algorithm, and re-use the samples used in the estimation algorithm itself. The resulting data-dependent stopping rule adapts instance-specific difficulty of the problem and allows for early termination for problems with favorable structure. We highlight benefit of such early stopping rules via some numerical studies.

© JMLR 2023. (edit, beta)