Reinforcement Learning with Factored States and Actions
Brian Sallans, Geoffrey E. Hinton; 5(Aug):1063--1088, 2004.
Abstract
A novel approximation method is presented for approximating
the value function and selecting good actions for Markov
decision processes with large state and action spaces.
The method approximates state-action values as negative
free energies in an undirected graphical model called a
product of experts. The model parameters can be learned
efficiently because values and derivatives can be efficiently
computed for a product of experts. Actions can be found
even in large factored action spaces by the use of Markov
chain Monte Carlo sampling. Simulation results show
that the product of experts approximation can be used to
solve large problems. In one simulation it is used
to find actions in action spaces of size 2
40.
[abs][pdf][ps.gz][ps]