Loading [MathJax]/jax/output/HTML-CSS/jax.js



Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Online Non-stochastic Control with Partial Feedback

Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou; 24(273):1−50, 2023.

Abstract

Online control with non-stochastic disturbances and adversarially chosen convex cost functions, referred to as online non-stochastic control, has recently attracted increasing attention. We study online non-stochastic control with partial feedback, where learners can only access partially observed states and partially informed (bandit) costs. The problem setting arises naturally in real-world decision-making applications and strictly generalizes exceptional cases studied disparately by previous works. We propose the first online algorithm for this problem, with an ˜O(T3/4) regret competing with the best policy in hindsight, where T denotes the time horizon and the ˜O()-notation omits the poly-logarithmic factors in T. To further enhance the algorithms' robustness to changing environments, we then design a novel method with a two-layer structure to optimize the dynamic regret, a more challenging measure that competes with time-varying policies. Our method is based on the online ensemble framework by treating the controller above as the base learner. On top of that, we design two different meta-combiners to simultaneously handle the unknown variation of environments and the memory issue arising from the online control. We prove that the two resulting algorithms enjoy ˜O(T3/4(1+PT)1/2) and ˜O(T3/4(1+PT)1/4+T5/6) dynamic regret respectively, where PT measures the environmental non-stationarity. Our results are further extended to unknown transition matrices. Finally, empirical studies in both synthetic linear and simulated nonlinear tasks validate our method's effectiveness, thus supporting the theoretical findings.

[abs][pdf][bib]       
© JMLR 2023. (edit, beta)

Mastodon