When:
Friday, November 22, 2024
11:00 AM - 12:00 PM CT
Where: Chambers Hall, Ruan Conference Room – lower level, 600 Foster St, Evanston, IL 60208 map it
Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students
Cost: free
Contact:
Kisa Kowal
(847) 491-3974
Group: Department of Statistics and Data Science
Category: Academic, Lectures & Meetings
Structure-driven design of reinforcement learning algorithms: a tale of two estimators
Wenlong Mou, Assistant Professor of Statistical Sciences, University of Toronto
Abstract: Reinforcement learning (RL) offers a flexible framework for sequential decision-making in uncertain environments, and its success heavily depends on efficiently learning value functions. Over the years, a diverse range of RL algorithms has been proposed, but at their core, two foundational principles stand out: to solve the Bellman fixed-point equations (known as ``bootstrapping methods''), or to average the rollout rewards. Despite their success, finding the optimal trade-off between these principles in practical applications remains elusive. Current theoretical guarantees -- either worst-case or asymptotic -- often fall short of providing actionable insights.
In this talk, I will discuss recent advances in methods that optimally reconcile bootstrapping and rollout for policy evaluation. The bulk of this talk will focus on a new class of estimators that strikes an optimal balance between temporal difference learning and Monte Carlo methods. Through the statistical lens, I will highlight why the local structure of the underlying Markov chain determines the fundamental complexity for estimation, and how our estimator adapts to these structures. Extending this perspective to continuous-time RL, I will also explore how the elliptic structure of diffusion processes provides key insights for making algorithmic choices.