A Geometric Approach to Multi-Criterion Reinforcement Learning
Shie Mannor, Nahum Shimkin; 5(Apr):325--360, 2004.
Abstract
We consider the problem of reinforcement learning in a controlled
Markov environment with multiple objective functions of the
long-term average reward type. The environment is initially
unknown, and furthermore may be affected by the actions of other
agents, actions that are observed but cannot be predicted
beforehand. We capture this situation using a stochastic game
model, where the learning agent is facing an adversary whose
policy is arbitrary and unknown, and where the reward function is
vector-valued. State recurrence conditions are imposed throughout.
In our basic problem formulation, a desired target set is
specified in the vector reward space, and the objective of the
learning agent is to
approach the target set, in the sense
that the long-term average reward vector will belong to this set.
We devise appropriate learning algorithms, that essentially use
multiple reinforcement learning algorithms for the standard scalar
reward problem, which are combined using the geometric insight
from the theory of approachability for vector-valued stochastic
games. We then address the more general and optimization-related
problem, where a nested class of possible target sets is
prescribed, and the goal of the learning agent is to approach the
smallest possible target set (which will generally depend on the
unknown system parameters). A particular case which falls into
this framework is that of stochastic games with average reward
constraints, and further specialization provides a reinforcement
learning algorithm for constrained Markov decision processes. Some
basic examples are provided to illustrate these results.
[abs][pdf][ps.gz][ps]