Lemma D.1 (Corollary
4.3)
Assume that the environment is such that
![$ \sum_y \vert P(x,u_1,y) -
P(x,u_2,y)\vert \le K \Vert u_1 - u_2\Vert$](img256.gif)
for all
![$ x,y,u_1,u_2$](img257.gif)
. Let
![$ \varepsilon $](img1.gif)
be a prescribed number. For sufficiently large
![$ \Lambda $](img7.gif)
and sufficiently small time steps, the SDS controller
described in Equation
10 and the environment form an
![$ \varepsilon $](img1.gif)
-MDP.
Proof.
From [
Szepesvári et al.(1997)] it is known that for
sufficiently fine time steps, the eventual tracking error is
bounded by
![$ \textit{const}/\Lambda$](img432.gif)
, i.e., for sufficiently large
![$ t$](img22.gif)
,
For sufficiently large
![$ \Lambda $](img7.gif)
,
![$ \textit{const}/\Lambda \le
\varepsilon $](img434.gif)
. Therefore for arbitrary value function
![$ S$](img337.gif)
we may write
This means that the system is indeed an
![$ \varepsilon $](img1.gif)
-MDP.