Asymptotic Boundedness of Value Iteration

In this section we prove a generalized form of the convergence theorem of Szepesvári and Littman's (for the original theorem, see Appendix A). We do not require probability one uniform convergence of the approximating operators, but only a sufficiently close approximation. Therefore the theorem can be applied to prove results about algorithms in generalized $\varepsilon$ -MDPs. Our definition of closeness both for value functions and dynamic-programming operators is given below.

Let

be an arbitrary state space and denote by $\mathbf{B} (X)$ the set of value functions. Let $T: \mathbf{B}(X) \to \mathbf{B}(X)$ be an arbitrary contraction mapping with unique fixed point

, and let $T_t: \mathbf{B}(X) \times \mathbf{B}(X) \to \mathbf{B}(X)$ be a sequence of random operators.

Note that

may depend on the approximated value function

, unlike the previous example in Equation 4. $\kappa$ -approximation of value functions is, indeed, weaker (more general) than probability one uniform convergence: the latter means that for all $\varepsilon , \delta>0$ there exists a

such that

Theorem 3.3 Let

be an arbitrary mapping with fixed point

, and let

$\kappa$ -approximate

over

. Let

be an arbitrary value function, and define $V_{t+1} = T_t(V_t, V_t)$ . If there exist functions $0 \leq F_t(x) \leq 1$ and $0 \leq G_t(x) \leq 1$ satisfying the conditions below with probability one

for all $U_1, U_2 \in \mathcal{V}$ and all $x\in X$ ,

$\displaystyle \big\vert T_t(U_1,V^*)(x)-T_t(U_2,V^*)(x) \big\vert \leq G_t(x) \big\vert U_1(x) - U_2(x) \big\vert$
for all $U, V \in \mathcal{V}$ and all $x\in X$ ,

$\displaystyle \big\vert T_t(U,V^*)(x)-T_t(U,V)(x) \big\vert \leq F_t(x) \sup_{x'} \big\vert V^*(x') - V(x') \big\vert$
for all , $\prod_{t=k}^n G_t(x)$ converges to zero uniformly in as increases;² and,
there exists $0 \leq \gamma < 1$ such that for all $x\in X$ and sufficiently large ,

$\displaystyle F_t(x)\leq \gamma(1-G_t(x)) \textrm{ w.p.1.}$

then

$\kappa'$ -approximates

over

, where $\kappa' = \frac{2}{1-\gamma} \kappa$ .

Proof. The proof is similar to that of the original theorem. First we define the sequence of auxiliary functions

by the recursion

, $U_{t+1} = T_t (U_t, V^*)$ . Since

$\kappa$ -approximates

, $\limsup_{t\to\infty} \Vert U_t - V^* \Vert \le \kappa$ , i.e., for sufficiently large

and $t \ge t_0$ , $\Vert U_t - V^* \Vert \le \kappa$ . Let

$\displaystyle \delta_t(x) = \vert U_{t}(x) - V_t(x)\vert.$

For $t \ge t_0$ we have

$\displaystyle \delta_{t+1}(x)$	$\displaystyle =$	$\displaystyle \vert U_{t+1}(x) - V_{t+1}(x)\vert$
	$\displaystyle =$	$\displaystyle \vert T_t(U_t,V^*)(x) - T_t(V_t,V_t)(x)\vert$
	$\displaystyle \le$	$\displaystyle \vert T_t(U_t,V^)(x) - T_t(V_t,V^)(x)\vert + \vert T_t(V_t,V^*)(x) - T_t(V_t,V_t)(x)\vert$
	$\displaystyle \le$	$\displaystyle G_t(x) \vert U_t(x)-V_t(x)\vert + F_t(x) \left\Vert V^*-V_t\right\Vert$
	$\displaystyle \le$	$\displaystyle G_t(x) \vert U_t(x)-V_t(x)\vert + F_t(x) (\left\Vert V^*-U_t\right\Vert + \left\Vert U_t-V_t\right\Vert )$
	$\displaystyle \le$	$\displaystyle G_t(x) \delta_t(x) + F_t(x) (\kappa + \left\Vert\delta_t\right\Vert )$

Then, by Lemma C.1 (found in the appendix), we get that $\limsup_{t\to\infty} \Vert \delta_t \Vert \le \frac{1+\gamma}{1-\gamma} \kappa$ . From this, $\limsup_{t\to\infty} \Vert V_t - V^* \Vert \le \frac{1+\gamma}{1-\gamma} \kappa + \kappa = \frac{2}{1-\gamma} \kappa$ . $\qedsymbol$