Convergence properties of the event-value function for the two-link pendulum are shown in Figure 5. The experiment concerns crude discretization of the state space. No change of the parameters of the pendulum are made. However, crude discretization of the environment and a robust controller, which is part of the environment, exhibits itself as a varying environment.
The theorems of Section 3 concern supremum norm. Two
curves about the supremum norm are shown in
Figure 5A, one with the SDS controller turned
off () and another one with the SDS controller on
. Convergence occurs for learning with the
SDS controller `on'.10 Interestingly, convergence is faster with the
SDS controller than without it. This is a consequence of the
larger variety of actions available when the robust controller is
applied.
The square norm against the last event-value function of this
series of experiments (Figure 5B) may provide
insight into the performance of the two-link pendulum. The
performance of the pendulum can be characterized by the average
task duration and the standard deviation of task duration during
the course of learning (Figure 5C). There is a
clear advantage for the
case against learning
without the robust controller.
![]() |