Finite-sample Analysis of Bellman Residual Minimization
Odalric-Ambrym Maillard (INRIA Lille Nord-Europe), Remi Munos (INRIA Lille Nord-Europe),
Alessandro Lazaric (INRIA Lille Nord-Europe), and Mohammad Ghavamzadeh (INRIA Lille Nord-Europe);
JMLR W&P 13:299-314, 2010.
Abstract
We consider the Bellman residual minimization approach for solving
discounted Markov decision problems, where we assume that a generative
model of the dynamics and rewards is available. At each policy
iteration step, an approximation of the value function for the current
policy is obtained by minimizing an empirical Bellman residual
defined on a set of n states drawn i.i.d. from a distribution \mu, the
immediate rewards, and the next states sampled from the model. Our
main result is a generalization bound for the Bellman residual in linear
approximation spaces. In particular, we prove that the empirical
Bellman residual approaches the true (quadratic) Bellman residual
in \mu-norm with a rate of order O(1=pn). This result implies that
minimizing the empirical residual is indeed a sound approach for the
minimization of the true Bellman residual which guarantees a good
approximation of the value function for each policy. Finally, we derive
performance bounds for the resulting approximate policy iteration algorithm
in terms of the number of samples n and a measure of how
well the function space is able to approximate the sequence of value
functions.