When:
Friday, May 23, 2025
11:00 AM - 12:00 PM CT
Where: Chambers Hall, Ruan Conference Room – lower level, 600 Foster St, Evanston, IL 60208 map it
Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students
Cost: free
Contact:
Kisa Kowal
(847) 491-3974
Group: Department of Statistics and Data Science
Category: Academic, Lectures & Meetings
Data Augmentation for Graph Regression
Meng Jiang, Associate Professor, Department of Computer Science and Engineering, University of Notre Dame
Abstract: Graph regression plays a key role in materials discovery by enabling the prediction of numerical properties of molecules and polymers. However, graph regression models often rely on training sets with only a few hundred labeled examples, and these labels are typically imbalanced. While a large number of unlabeled examples are available, they are often drawn from diverse domains, making them less effective for improving target label predictions. In machine learning, data augmentation refers to techniques that increase the size of the training set by generating slightly modified or synthetic versions of existing data. These methods are simple yet effective. In this talk, I will introduce three graph data augmentation techniques tailored for supervised learning, imbalanced learning, and transfer learning in graph regression tasks. The first technique leverages the mutual enhancement between model rationalization and data augmentation, improving both accuracy and interpretability in molecular and polymer property prediction. This approach demonstrates that graph data augmentation can be effectively performed in latent spaces. The second technique generates representations of additional data points with underrepresented labels to balance the training set. The third technique introduces a graph diffusion transformer (Graph DiT) that facilitates data-centric transfer learning, addressing the limitations of self-supervised methods when dealing with unlabeled graph data. Graph DiT integrates multiple properties such as synthetic score and gas permeability as condition constraints into diffusion models for multi-conditional polymer generation. Lastly we will discuss foundation model approaches for materials discovery.