Next: Introduction
Journal of Machine Learning Research (3) 2002 145-174 Submitted 10/01; Revised 1/02; Published 8/02
István Szita szityu@cs.elte.hu
Bálint Takács deim@inf.elte.hu
András Lőrincz lorincz@inf.elte.hu
Department of Information Systems, Eötvös Loránd University
Pázmány Péter sétány 1/C
Budapest, Hungary H-1117
Editor: Sridhar Mahadevan
In this paper -MDP-models are introduced and convergence
theorems are proven using the generalized MDP framework of Szepesvári and
Littman. Using this model family, we show that Q-learning is capable of finding
near-optimal policies in varying environments. The potential of this new family
of MDP models is illustrated via a reinforcement learning algorithm called event-learning
which separates the optimization of decision making from the controller. We
show that event-learning augmented by a particular controller, which gives rise
to an
-MDP, enables near optimal performance even
if considerable and sudden changes may occur in the environment. Illustrations
are provided on the two-segment pendulum problem.