Continuous Rapid Action Value Estimates
A. Couëtoux, M. Milone, M. Brendel, H.
Doghmen, M.S. & O. Teytaud; JMLR W&CP 20:19–31, 2011.
Abstract
In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain
of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence
Tree algorithm to handle the exploration
versus exploitation trade-off, while a few heuristics are
used to guide the exploration in large search spaces. Among these heuristics is
Rapid Action Value
Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous
action and state spaces. The approach is experimentally validated on two artificial
benchmark problems: the treasure hunt game, and a real-world energy management
problem.
Page last modified on Sun Nov 6 15:42:00 2011.