Skip to main content

Statistics and Data Science Seminar: "On Fine-Tuning Large Language Models with Less Labeling Cost"

Friday, October 13, 2023 | 2:00 PM - 3:00 PM CT
Chambers Hall, Ruan Conference Room – lower level , 600 Foster St, Evanston, IL 60208 map it

On Fine-Tuning Large Language Models with Less Labeling Cost

Tuo Zhao, Assistant Professor, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Tech

Abstract: Labeled data is critical to the success of deep learning across various applications, including natural language processing, computer vision, and computational biology. While recent advances like pre-training have reduced the need for labeled data in these domains, increasing the availability of labeled data remains the most effective way to improve model performance. However, human labeling of data continues to be expensive, even when leveraging cost-effective crowd-sourced labeling services. Further, in many domains, labeling requires specialized expertise, which adds to the difficulty of acquiring labeled data.

In this talk, we demonstrate how to utilize weak supervision together with efficient computational algorithms to reduce data labeling costs. Specifically, we investigate various forms of weak supervision, including external knowledge bases, auxiliary computational tools, and heuristic rule-based labeling. We showcase the application of weak supervision to both supervised learning and reinforcement learning across various tasks, including natural language understanding, molecular dynamics simulation, and code generation.

 

Cost: free

Audience

  • Faculty/Staff
  • Student
  • Post Docs/Docs
  • Graduate Students

Contact

Kisa Kowal   (847) 491-3974

k-kowal@northwestern.edu

Interest

  • Academic (general)

Add Event To My Group

Please sign-in