- ....1
- Unless otherwise noted, denotes
the max-norm.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... increases;2
- Note that the convergence of an infinite product implies that the terms converge to one.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
(E-learning,3
- Capital letter E is used to distinguish
E-learning from internet based concepts using prefix lower case
letter `e'.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....4
- Note
that depends on both and . When no ambiguity may arise we will
not explicitly show these dependencies.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... tracking5
- The term, `velocity
field tracking', may represent the underlying objective of speed
field tracking better.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Szepesvari97Neurocontroller,Szepesvari97Approximate.6
- Sign-properness
imposes conditions on the sign but not on the magnitude of the
components of the output of the approximate inverse dynamics.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... satisfied.7
- Justification of this
assumption requires techniques of ordinary differential equations
and is omitted here. See also [Barto(1978)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....8
- Note that the condition on is a
kind of Lipschitz-continuity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
arm.9
- The parameters for SARSA were taken from the work of
[Aamodt(1997)] and can be considered near-optimal for
the SARSA implementation, which was also taken from the same
source.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... `on'.10
- Note that the optimal value function
is not available and the norm was computed versus the last state
of the experiment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... distributions11
-
abbreviates
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.