Northwestern Events Calendar

Feb
24
2025

CS Seminar: Vector-Centric Machine Learning Systems: A Cross-Stack Approach (Wenqi Jiang)

recurring see all events in this series

When: Monday, February 24, 2025
12:00 PM - 1:00 PM CT

Where: Mudd Hall ( formerly Seeley G. Mudd Library), 3514, 2233 Tech Drive, Evanston, IL 60208 map it

Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students

Cost: free

Contact: Wynante R Charles   (847) 467-8174

Group: Department of Computer Science (CS)

Category: Academic, Lectures & Meetings

Description:

Monday / CS Seminar
February 24th / 12:00 PM
Hybrid / Mudd 3514

Speaker
Wenqi Jiang, ETH Zurich

Talk Title
Vector-Centric Machine Learning Systems: A Cross-Stack Approach

Abstract
"Despite the recent popularity of large language models (LLMs), the transformer neural network invented eight years ago has remained largely unchanged. It prompts the question of whether machine leanring (ML) systems research is solely about improving hardware and software for tensor operations. In this talk, I will argue that the future of machine learning systems extends far beyond model acceleration. Using the increasingly popular retrieval-augmented generation (RAG) paradigm as an example, I will show that the growing complexity of ML systems demands a deeply collaborative effort spanning data management, systems, computer architecture, and ML.

I will present RAGO and Chameleon, two pioneering works in this field. RAGO is the first systematic performance study of retrieval-augmented generation. It uncovers the intricate interactions between vector data systems and models, revealing drastically different performance characteristics across various RAG workloads. To navigate this complex landscape, RAGO introduces a system optimization framework to explore optimal system configurations for arbitrary RAG algorithms. Building on these insights, I will introduce Chameleon, the first heterogeneous accelerator system for RAG. Chameleon combines LLM and retrieval accelerators within a disaggregated architecture. The heterogeneity ensures efficient serving of both LLM inference and retrievals, while the disaggregation enables independent scaling of different system components to accommodate diverse RAG workload requirements. I will conclude the talk by emphasizing the necessity of cross-stack co-design for future ML systems and the abundant of opporutnities ahead of us."

Biography
Wenqi Jiang is a final-year PhD student at ETH Zurich, advised by Gustavo Alonso and Torsten Hoefler. He aims to enable more efficient, next-generation machine learning systems. Rather than focusing on a single layer in the computing stack, Wenqi's research spans the intersections of data management, computer systems, and computer architecture. His work has driven advancements in several areas, including retrieval-augmented generation (RAG), vector search, and recommender systems. These contributions have earned him recognition as one of the ML and Systems Rising Stars, as well as the AMD HACC Outstanding Researcher Award.

Research/Interest Areas
Data management, computer systems, and computer architecture.
---
Zoom: https://northwestern.zoom.us/j/98746799161?pwd=8tJL888y1j8GrawbwOrTXKT7S9GQA4.1
Panopto: https://northwestern.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cd3b91de-4058-478b-8d42-b28901171354
Community Connections Topic: Black Women in Computing

Add to Calendar

Add Event To My Group:

Please sign-in