Name: Bradly Stadie: Making Sense of the Past with Graph Based Planning and Cold Diffusion
Start: 2023-02-03T12:00:00-06:00
End: 2023-02-03T13:00:00-06:00
Location: Technological Institute, Mechanical Engineering, B211

Northwestern Events Calendar

Feb

2023

Bradly Stadie: Making Sense of the Past with Graph Based Planning and Cold Diffusion

Young man smiling. Bradly Stadie, Assistant Professor of Statistics from Northwestern University.

When: Friday, February 3, 2023
12:00 PM - 1:00 PM CT

Where: Technological Institute, Mechanical Engineering, B211, 2145 Sheridan Road, Evanston, IL 60208 map it

Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students

Contact: Amy Nedoss

Group: Center for Robotics and Biosystems (CRB)

Category: Academic, Lectures & Meetings

Description:

Join the Center for Robotics and Biosystems (CRB) for the February Speaker Series

Speaker: Bradly Stadie, Assistant Professor of Statistics, Northwestern University

Date and Time: Friday, February 3 at 12:00 p.m. CT

Location: Tech B211 and Zoom
Zoom Link: https://tinyurl.com/CRBSeminar
• NU-authenticated attendees will be automatically admitted. Others, please email amy.nedoss@northwestern.edu to be admitted from the waiting room.

Abstract:
How should we train reinforcement learning agents that are capable of abstract planning? One recent technique that has shown much promise is Search on the Replay Buffer. In this technique, agents treat a buffer of past experience as nodes on a graph and execute a graph search algorithm. The result of this search is a sequence of known good states that an agent can use to navigate its environment. There are two obvious shortcomings with this approach. 1) If we assume we can transition between every pair of states, then the complexity of the path planning problem is exponential in the size of the buffer. 2) Exploration over different possible paths remains a challenge, even if we prune the number of nodes.

Towards overcoming these shortcomings, we introduce L3P, an algorithm for learning latent landmarks for planning. This algorithm learns landmarks in a latent space, where the distance between graph nodes is optimized to equal the number of steps it takes to transition between environmental states. We introduce a novel clustering algorithm that forces states that are close in this metric to cluster together, reducing our buffer to a few representative latent landmarks. Finally, we wrap up by considering cold diffusion approaches to solving this path planning problem. We show that sequence planning over a buffer can be recast as a diffusion model, and introduce the notion of Maximum Entropy Subgoal Skipping (MESS) to help with exploration.

Bio:
Bradly Stadie's research explores techniques for developing general machine intelligence. Recently, foundational models such as GPT-3 have provided a promising avenue towards training intelligent machines. In particular, these foundational models show that we can leverage large quantities of unsupervised data to learn a latent underlying structure of a data space. In this latent space, planning and abstract reasoning become much more tractable.

In spite of its promise, there does not currently exist a GPT-3 equivalent in Reinforcement Learning. The current leading method, learning a dynamics model to predict the agent's next state, and then using this model to bootstrap a planning or curiosity module, has proven ineffective. Dr. Stadie’s current research proposes an enticing alternative: that goal-reaching should be used as the foundational model for reinforcement learning. The idea is as follows: they want agents to learn in an unsupervised fashion how to generate and reach various goal states in their environments. They can then bootstrap from this goal-reaching ability, breaking complex goals into a series of simpler tasks. This will endow agents with the ability to plan and reason over long time horizons, an essential capacity for the emergence of general intelligence. There exists a deep relationship between unsupervised goal-reaching and imitation learning, which frequently comes up in his research. From time to time, he also uses various tools from graph search, causal inference, and generative networks.

Add to Calendar

Add Event To My Group:

Please sign-in