Northwestern Events Calendar

May
30
2025

Statistics and Data Science Seminar: "LLM-Enhanced, Theme-Focused Science Discovery: A Retrieval and Structuring Approach"

When: Friday, May 30, 2025
11:00 AM - 12:00 PM CT

Where: Online

Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students

Cost: free

Contact: Kisa Kowal   (847) 491-3974

Group: Department of Statistics and Data Science

Category: Academic, Lectures & Meetings

Description:

LLM-Enhanced, Theme-Focused Science Discovery: A Retrieval and Structuring Approach

Jiawei Han, Michael Aiken Chair Professor, Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign

ABSTRACT: Large Language Models (LLMs) may bring unprecedent power in scientific discovery.  However, current LLMs may still encounter major challenges for effective scientific exploration due to their lack of in-depth, theme-focused data and knowledge.  Retrieval augmented generation (RAG) has recently become an interesting approach for augmenting LLMs with grounded, theme-specific datasets.  We discuss the challenges of RAG and propose a retrieval and structuring (RAS) approach, which enhances RAG by improving retrieval quality and mining structures (e.g., extracting entities and relations and building knowledge graphs) to ensure its effective integration of theme-specific data with LLM.  We show the promise of this approach at augmenting LLMs and discuss its potential power for LLM-enabled science exploration.  

Short bio: Jiawei Han is Michael Aiken Chair Professor in the Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign.  He received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), Japan's Funai Achievement Award (2018), and being elevated to Fellow of Royal Society of Canada (2022).  He is Fellow of ACM and Fellow of IEEE and served as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab and co-Director of KnowEnG, a Center of Excellence in Big Data Computing (2014-2019), funded by NIH Big Data to Knowledge (BD2K) Initiative.  Currently, he is serving on the executive committees of two NSF funded research centers:  MMLI (Molecular Maker Research Institute)—one of NSF funded national AI centers since 2020 and I-Guide—The National Science Foundation (NSF) Institute for Geospatial Understanding through an Integrative Discovery Environment (I-GUIDE) since 2021.

Register More Info Add to Calendar

Add Event To My Group:

Please sign-in