Processing math: 100%



Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Off-policy Learning With Eligibility Traces: A Survey

Matthieu Geist, Bruno Scherrer; 15(10):289−333, 2014.

Abstract

In the framework of Markov Decision Processes, we consider linear off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares- based), adopting a unified algorithmic view. Then, we highlight a systematic approach for adapting them to off-policy learning with eligibility traces. This leads to some known algorithms---off-policy LSTD(λ), LSPE(λ), TD(λ), TDC/GQ(λ)---and suggests new extensions ---off-policy FPKF(λ), BRM(λ), gBRM(λ), GTD2(λ). We describe a comprehensive algorithmic derivation of all algorithms in a recursive and memory-efficent form, discuss their known convergence properties and illustrate their relative empirical behavior on Garnet problems. Our experiments suggest that the most standard algorithms on and off-policy LSTD(λ)/LSPE(λ)---and TD(λ) if the feature space dimension is too large for a least-squares approach---perform the best.

[abs][pdf][bib]       
© JMLR 2014. (edit, beta)

Mastodon