Exploration Bonus for Policy Optimization
Policy optimization is a widely-used method in reinforcement learning. However, due to its local search nature, it can take an exponential time to find the optimal policy even in tabular Markov Decision Processes. In this talk, I will demonstrate how to address this issue by adding “exploration bonus” that guides the learner to perform global exploration. Furthermore, I will discuss how to use the method in conjunction with linear function approximation. The talk is based on the NeurIPS 2021 paper “Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses” coauthored by Haipeng Luo, Chen-Yu Wei, and Chung-Wei Lee.
Chen-Yu Wei is an Assistant Professor in the Computer Science department at the University of Virginia. Previously, he was a Postdoctoral Associate at Massachusetts Institute of Technology. He received a Ph.D. in Computer Science from University of Southern California, an M.S. degree in Communication Engineering from National Taiwan University, and a B.S. degree in Electrical Engineering from National Taiwan University. His research focuses on fundamental problems in interactive decision making and reinforcement learning. Specifically, he is interested in developing robust and adaptive algorithms that interact with potentially non-stationary or adversarial environments. Also, he is interested in understanding the sample and computational complexity involved in reinforcement learning problems, and devising efficient algorithms to tackle them.