Curing AI Issues at the Source: The Power of Data-Centric Learning
Yanjie Fu, Associate Professor, School of Computing and Augmented Intelligence, Arizona State University
Abstract: Recent progress in AI has been driven largely by scaling models and compute. Yet in many real-world and scientific settings, AI failures are still rooted less in model architecture than in the data itself: missing or incomplete observations, noisy labels, distribution shift, imbalance, poor feature geometry, and weak coverage of the underlying domain. This talk argues for a shift from a model-centric view of AI to a data-centric learning perspective, where the central goal is not only to train better models, but to reshape better data for learning. I will present a unifying view of data-centric learning through the lens of data-model knowledge alignment: data serves as the knowledge base, models learn knowledge from data, and poor alignment between the two leads to poor generalization, shortcut learning, instability, and low trust. I will introduce key directions in this space, including data curation, relabeling, synthetic data generation, feature selection, feature transformation, and data reprogramming. I will also highlight our recent work on AI4Data-RL, AI4Data-GenAi, AI4Data-LLM&Agents. Overall, the talk will discuss how data-centric learning opens a new path toward more robust and trustworthy AI systems.
Cost: free
Audience
- Faculty/Staff
- Student
- Post Docs/Docs
- Graduate Students
Contact
Kisa Kowal
(847) 491-3974
Email
Interest
- Academic (general)