[Szepesvári and Littman(1996)] have introduced a more general
model. Their basic concept is that in the Bellman equations, the
operation
(i.e., taking expected value
w.r.t. the transition probabilities) describes the effect of the
environment, while the operation
describes the effect
of an optimal agent (i.e., selecting an action with maximum
expected value). Changing these operators, other well-known models
can be recovered.
A generalized MDP is defined by the tuple
, where X, A, R are defined as above;
is an ``expected value-type" operator
and
is a ``maximization-type" operator. For
example, by setting
and
(where
and
), the
expected-reward MDP model appears.
The task is to find a value function satisfying the abstract
Bellman equations:
![]() |
![]() |
The great advantage of the generalized MDP model is that a wide range of models, e.g., Markov games [Littman(1994)], alternating Markov games [Boyan(1992)], discounted expected-reward MDPs [Watkins and Dayan(1992)], risk-sensitive MDPs [Heger(1994)], exploration-sensitive MDPs [John(1994)] can be discussed in this unified framework. For details, see the work of [Szepesvári and Littman(1996)].