When:
Friday, February 21, 2025
11:00 AM - 12:00 PM CT
Where: Online
Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students
Cost: free
Contact:
Kisa Kowal
(847) 491-3974
Group: Department of Statistics and Data Science
Category: Academic, Lectures & Meetings
Knowledge-Guided Machine Learning for Scientific Discovery: Challenges and Opportunities
Xiaowei Jia, Assistant Professor, Department of Computer Science, University of Pittsburgh
Abstract:
Data science and machine learning (ML) models, which have found tremendous success in several commercial applications where large-scale data is available, e.g., computer vision and natural language processing, has met with limited success in scientific domains. Traditionally, physics-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to incomplete or inaccurate representations of the physical processes being modeled. Given rapid data growth due to advances in sensor technologies, there is a tremendous opportunity to systematically advance modeling in these domains by using machine learning methods. However, capturing this opportunity is contingent on a paradigm shift in data-intensive scientific discovery since the “black box” use of ML often leads to serious false discoveries in scientific applications. Because the hypothesis space of scientific applications is often complex and exponentially large, an uninformed data-driven search can easily select a highly complex model that is neither generalizable nor physically interpretable, resulting in the discovery of spurious relationships, predictors, and patterns. This problem becomes worse when there is a scarcity of labeled samples, which is quite common in science and engineering domains.
My work aims to build the foundations of knowledge-guided machine learning (KGML) by exploring several ways of bringing scientific knowledge and machine learning models together. In particular, we discuss gaps and opportunities in scientific discovery and show the effectiveness of KGML in multiple applications of great societal and scientific relevance. My work also has the potential to greatly advance the pace of discovery in a number of scientific and engineering disciplines where physics-based models are used, e.g., hydrology, agriculture, climate science, materials science, power engineering and biomedicine.