CIS Seminar: “Structural Foundations of Efficient Reinforcement Learning:
February 16, 2021 at 3:00 PM - 4:00 PM
Details
Organizer
Abstract:
The design of learning agents which observe, interact with, and manipulate their environment to optimize desirable behaviors is a long-standing goal in machine learning, with roots in artificial intelligence, adaptive experimental design and adaptive feedback control. In machine learning, these questions are typically studied in the area of reinforcement learning (RL), which has seen a recent surge of interest both due to potential applications, from robotics and autonomous systems to healthcare and recommendations; as well as popular successes such as superhuman performance in games and dexterous manipulation of a Rubik’s cube.
On the theoretical front, foundations for sample efficient learning were laid in the early 2000’s, for problems where the agent perceives the environment through relatively simple observations. However, these settings fail to capture the complex sensorimotor observation streams which most application domains naturally contain. In this talk, I will describe a research program aimed at addressing this critical gap in our theoretical understanding.
I will begin by highlighting some key challenges faced by an RL agent, and the importance of understanding the structure of real-world applications to address these challenges. With this aim, I will then introduce a complexity measure, called Bellman rank, for general RL problems. Crucially, many application domains naturally exhibit a small Bellman rank, and I will describe how low Bellman rank enables sample efficient RL. Bellman rank remains one of the most general ways of measuring the complexity of RL problems, however, computationally practical algorithms for all problems with a small Bellman rank still elude us.
The second part of the talk will focus on algorithmic questions, designing optimization-based methods for solving a subclass of problems with a small Bellman rank. I will present an algorithm Policy Cover Policy Gradient (PC-PG), which comes with strong practical guarantees when the problem dynamics obey a certain linear structure. The algorithm is highly practical, and easily composes with modern deep learning libraries for an efficient implementation. I will confirm its effectiveness beyond the confines of the theoretical assumptions in empirical evaluation against popular baselines.
Finally, I will conclude with a brief synopsis of work I have done on Contextual Bandits, a much smaller, yet practically useful subclass of RL. The research here has led to the design and creation of a general purpose cloud service at Microsoft, which powers many applications of Contextual Bandits both inside and outside the company.
The first two parts of the talk are based on the papers https://arxiv.org/abs/1610.09512 and https://arxiv.org/pdf/2007.08459.pdf respectively.

