Statistics and Data Science Seminar: "When do spectral gradient updates help in deep learning?"

Name: Statistics and Data Science Seminar: "When do spectral gradient updates help in deep learning?"
Start: 2026-05-01T11:00:00-05:00
End: 2026-05-01T12:00:00-05:00
Location: Chambers Hall, Ruan Conference Room – lower level

Friday, May 1, 2026 | 11:00 AM - 12:00 PM CT

Chambers Hall, Ruan Conference Room – lower level, 600 Foster St, Evanston, IL 60208 map it

When do spectral gradient updates help in deep learning?

Dmitriy Drusvyatskiy, Professor and HDSI Faculty Fellow, Halıcıoğlu Data Science Institute (HDSI), University of California San Diego

Abstract: Spectral gradient methods, such as the recently proposed Muon optimizer, are a promising alternative to standard gradient descent for training deep neural networks and transformers. Yet, it remains unclear in which regimes these spectral methods are expected to perform better. In this talk, I will present a simple condition that predicts when a spectral update yields a larger decrease in the loss than a standard gradient step. Informally, this criterion holds when, on the one hand, the gradient of the loss with respect to each parameter block has a nearly uniform spectrum—measured by its nuclear-to-Frobenius ratio—while, on the other hand, the incoming activation matrix has low stable rank. It is this mismatch in the spectral behavior of the gradient and the propagated data that underlies the advantage of spectral updates. Reassuringly, this condition naturally arises in a variety of settings, including random feature models, neural networks, and transformer architectures. I will conclude by showing that these predictions align with empirical results in synthetic regression problems and in small-scale language model training.

Cost: free

Audience

Faculty/Staff
Student
Post Docs/Docs
Graduate Students

Contact

Kisa Kowal
(847) 491-3974
Email

Group

Department of Statistics and Data Science

Interest

Academic (general)

Statistics and Data Science Seminar: "When do spectral gradient updates help in deep learning?"

Audience

Contact

Group

Interest

Add Event To My Group