Online Non-stochastic Control with Partial Feedback
Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou; 24(273):1−50, 2023.
Abstract
Online control with non-stochastic disturbances and adversarially chosen convex cost functions, referred to as online non-stochastic control, has recently attracted increasing attention. We study online non-stochastic control with partial feedback, where learners can only access partially observed states and partially informed (bandit) costs. The problem setting arises naturally in real-world decision-making applications and strictly generalizes exceptional cases studied disparately by previous works. We propose the first online algorithm for this problem, with an ˜O(T3/4) regret competing with the best policy in hindsight, where T denotes the time horizon and the ˜O(⋅)-notation omits the poly-logarithmic factors in T. To further enhance the algorithms' robustness to changing environments, we then design a novel method with a two-layer structure to optimize the dynamic regret, a more challenging measure that competes with time-varying policies. Our method is based on the online ensemble framework by treating the controller above as the base learner. On top of that, we design two different meta-combiners to simultaneously handle the unknown variation of environments and the memory issue arising from the online control. We prove that the two resulting algorithms enjoy ˜O(T3/4(1+PT)1/2) and ˜O(T3/4(1+PT)1/4+T5/6) dynamic regret respectively, where PT measures the environmental non-stationarity. Our results are further extended to unknown transition matrices. Finally, empirical studies in both synthetic linear and simulated nonlinear tasks validate our method's effectiveness, thus supporting the theoretical findings.
© JMLR 2023. (edit, beta) |