Northwestern Events Calendar

May
23
2025

Statistics and Data Science Seminar: "Data Augmentation for Graph Regression"

When: Friday, May 23, 2025
11:00 AM - 12:00 PM CT

Where: Chambers Hall, Ruan Conference Room – lower level, 600 Foster St, Evanston, IL 60208 map it

Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students

Cost: free

Contact: Kisa Kowal   (847) 491-3974

Group: Department of Statistics and Data Science

Category: Academic, Lectures & Meetings

Description:

Data Augmentation for Graph Regression

Meng Jiang, Associate Professor, Department of Computer Science and Engineering, University of Notre Dame

Abstract: Graph regression plays a key role in materials discovery by enabling the prediction of numerical properties of molecules and polymers. However, graph regression models often rely on training sets with only a few hundred labeled examples, and these labels are typically imbalanced. While a large number of unlabeled examples are available, they are often drawn from diverse domains, making them less effective for improving target label predictions. In machine learning, data augmentation refers to techniques that increase the size of the training set by generating slightly modified or synthetic versions of existing data. These methods are simple yet effective. In this talk, I will introduce three graph data augmentation techniques tailored for supervised learning, imbalanced learning, and transfer learning in graph regression tasks. The first technique leverages the mutual enhancement between model rationalization and data augmentation, improving both accuracy and interpretability in molecular and polymer property prediction. This approach demonstrates that graph data augmentation can be effectively performed in latent spaces. The second technique generates representations of additional data points with underrepresented labels to balance the training set. The third technique introduces a graph diffusion transformer (Graph DiT) that facilitates data-centric transfer learning, addressing the limitations of self-supervised methods when dealing with unlabeled graph data. Graph DiT integrates multiple properties such as synthetic score and gas permeability as condition constraints into diffusion models for multi-conditional polymer generation. Lastly we will discuss foundation model approaches for materials discovery.

More Info Add to Calendar

Add Event To My Group:

Please sign-in