A. Couëtoux, M. Milone, M. Brendel, H.
Doghmen, M.S. & O. Teytaud; JMLR W&CP 20:19–31, 2011.
Continuous Rapid Action Value Estimates
In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain
of large-scale Markov Decision Process problems. MCTS most often uses the Upper Conﬁdence
Tree algorithm to handle the exploration versus
exploitation trade-oﬀ, while a few heuristics are
used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value
(RAVE). This paper is concerned with extending the RAVE heuristics to continuous
action and state spaces. The approach is experimentally validated on two artiﬁcial
benchmark problems: the treasure hunt game, and a real-world energy management
Page last modified on Sun Nov 6 15:42:00 2011.