Next: Markov Decision Processes with Up: -MDPs: Learning in Varying Previous: Introduction

Preliminaries

To begin with, we recall the definition of a Markov Decision Process (MDP) [Puterman(1994)]. A (finite) MDP is defined by the tuple $\langle X, A, R, P \rangle$ , where and denotes the finite set of states and actions, respectively. $P: X \times A \times X \rightarrow [0,1]$ is called the transition function, since gives the probability of arriving at state after executing action in state . Finally, $R: X \times A \times X \rightarrow \mathbb{R}$ is the reward function, gives the immediate reward for the transition .

Subsections

Markov Decision Processes with the Expected Reward Criterion
Generalized Markov Decision Processes
- Q-learning in Generalized MDPs