The Convergence of a General Value Iteration Process

Let

be an arbitrary state-space and denote by $\mathbf{B} (X)$ the set of value functions over

(i.e., the set of bounded $X \to \mathbb{R}$ functions), and let $T: \mathbf{B}(X) \to \mathbf{B}(X)$ be an arbitrary contraction mapping with (unique) fixed point

Let $T_t: \mathbf{B}(X) \times \mathbf{B}(X) \to \mathbf{B}(X)$ be a sequence of stochastic operators. The second argument of

is intended to modify the first one, in order to get a better approximation of

. Formally, let

be an arbitrary value function and let $U_{t+1} = T_t (U_t,V)$ .

is said to approximate

with probability one over

, if $\ lim_{t\to\infty} U_t = TV$ uniformly over

Theorem A.1 (Szepesvári and Littman) Let the sequence of random operators

approximate

with probability one uniformly over

. Let

be an arbitrary value function, and define $V_{t+1} = T_t(V_t, V_t)$ . If there exist functions $0 \leq F_t(x) \leq 1$ and $0 \leq G_t(x) \leq 1$ satisfying the conditions below with probability one, then

converges to

with probability one uniformly over

for all $U_1, U_2 \in \mathbf{B}(X)$ and all $x\in X$ ,

$\displaystyle \big\vert T_t(U_1,V^*)(x)-T_t(U_2,V^*)(x) \big\vert \leq G_t(x) \big\vert U_1(x) - U_2(x) \big\vert$
for all $U, V \in \mathbf{B}(X)$ and all $x\in X$ ,

$\displaystyle \big\vert T_t(U,V^*)(x)-T_t(U,V)(x) \big\vert \leq F_t(x) \sup_{x'} \big\vert V^*(x') - V(x') \big\vert$
for all , $\prod_{t=k}^n G_t(x)$ converges to zero uniformly in as increases; and,
there exists $0 \leq \gamma < 1$ such that for all $x\in X$ and large enough ,

$\displaystyle F_t(x)\leq \gamma(1-G_t(x)).$

The proof can be found in [Szepesvári and Littman(1996)]. We cite here the lemma, which is the base of the proof, since our generalization concerns this lemma.

Lemma A.2 Let