Lyapunov Design for Safe Reinforcement Learning
Theodore J. Perkins, Andrew G. Barto;
3(Dec):803-832, 2002.
Abstract
Lyapunov design methods are used widely in control engineering to
design controllers that achieve qualitative objectives, such as
stabilizing a system or maintaining a system's state in a desired
operating range. We propose a method for constructing safe, reliable
reinforcement learning agents based on Lyapunov design principles. In
our approach, an agent learns to control a system by switching among a
number of given, base-level controllers. These controllers are
designed using Lyapunov domain knowledge so that
any switching
policy is safe and enjoys basic performance guarantees. Our approach
thus ensures qualitatively satisfactory agent behavior for virtually
any reinforcement learning algorithm and at all times, including while
the agent is learning and taking exploratory actions. We demonstrate
the process of designing safe agents for four different control
problems. In simulation experiments, we find that our theoretically
motivated designs also enjoy a number of practical benefits, including
reasonable performance initially and throughout learning, and
accelerated learning.
[abs]
[pdf]
[ps.gz]
[ps]