Online Non-stochastic Control with Partial Feedback
Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou.
Year: 2023, Volume: 24, Issue: 273, Pages: 1−50
Abstract
Online control with non-stochastic disturbances and adversarially chosen convex cost functions, referred to as online non-stochastic control, has recently attracted increasing attention. We study online non-stochastic control with partial feedback, where learners can only access partially observed states and partially informed (bandit) costs. The problem setting arises naturally in real-world decision-making applications and strictly generalizes exceptional cases studied disparately by previous works. We propose the first online algorithm for this problem, with an ˜O(T3/4) regret competing with the best policy in hindsight, where T denotes the time horizon and the ˜O(⋅)-notation omits the poly-logarithmic factors in T. To further enhance the algorithms' robustness to changing environments, we then design a novel method with a two-layer structure to optimize the dynamic regret, a more challenging measure that competes with time-varying policies. Our method is based on the online ensemble framework by treating the controller above as the base learner. On top of that, we design two different meta-combiners to simultaneously handle the unknown variation of environments and the memory issue arising from the online control. We prove that the two resulting algorithms enjoy ˜O(T3/4(1+PT)1/2) and ˜O(T3/4(1+PT)1/4+T5/6) dynamic regret respectively, where PT measures the environmental non-stationarity. Our results are further extended to unknown transition matrices. Finally, empirical studies in both synthetic linear and simulated nonlinear tasks validate our method's effectiveness, thus supporting the theoretical findings.