## Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

** Rémi Munos**; 7(14):413−427, 2006.

### Abstract

We study a variance reduction technique for Monte Carlo estimation
of functionals in Markov chains. The method is based on designing
*sequential control variates* using successive approximations
of the function of interest *V*. Regular Monte Carlo estimates have
a variance of *O(1/N)*, where *N* is the number of sample trajectories
of the Markov chain. Here, we obtain a geometric variance reduction
*O(ρ ^{N})* (with ρ<1) up to a threshold that depends on
the approximation error

*V-AV*, where

*A*is an

*approximation operator*linear in the values. Thus, if

*V*belongs to the right approximation space (i.e.

*AV=V*), the variance decreases geometrically to zero.

An immediate application is value function estimation in Markov chains, which may be used for policy evaluation in a policy iteration algorithm for solving Markov Decision Processes.

Another important domain, for which variance reduction is highly needed,
is gradient estimation, that is computing the sensitivity *∂ _{α}V*
of the performance measure

*V*with respect to some parameter α of the transition probabilities. For example, in policy parametric optimization, computing an estimate of the policy gradient is required to perform a gradient optimization method.

We show that, using two approximations for the *value function*
and the *gradient*, a geometric variance reduction is also achieved,
up to a threshold that depends on the approximation errors of both
of those representations.

© JMLR 2006. (edit, beta) |