Name: Statistics Seminar Series: Haiyan Wang, “A classification method for predicting type 2 diabetes mellitus using sequencing data”
Start: 2018-10-24T11:00:00-05:00
End: 2018-10-24T12:00:00-05:00
Location: 2006 Sheridan Road, B02

Northwestern Events Calendar

Oct

2018

Statistics Seminar Series: Haiyan Wang, “A classification method for predicting type 2 diabetes mellitus using sequencing data”

When: Wednesday, October 24, 2018
11:00 AM - 12:00 PM CT

Where: 2006 Sheridan Road, B02, Evanston, IL 60208 map it

Audience: Faculty/Staff - Post Docs/Docs - Graduate Students

Contact: Kisa Kowal (847) 491-3974

Group: Department of Statistics and Data Science

Category: Academic

Description:

Department of Statistics Fall 2018 Seminar Series

A classification method for predicting type 2 diabetes mellitus using sequencing data

Speaker: Haiyan Wang, Professor, Department of Statistics, Kansas state University

Time: 11:00am

Abstract: Type 2 diabetes mellitus (T2DM) affects the lives of millions of people through its life-altering complications. Current methods of identifying genetic polymorphisms responsible for T2DM face the limitation of sample size and low accuracy at the population level (AUC of 0.68 or below). This research presents a method to identify subtle effects of genetic variants using whole genome sequencing data and improve prediction accuracy of T2DM at the population level. To achieve this, a new feature selection procedure and a classier were proposed. The method involves (1) first applying sparse principal component analysis (PCA) to genotype data to obtain orthogonal features; (2) using SNP-specific regularization parameters to reduce the false positive rate of feature selection; (3) verifying feature relevance through Lasso penalized logistic regression in conjunction with sparse PCA. After applying to a dataset containing 625,597 SNPs and 23 environmental variables from each of 3,326 humans, the method identified over 450 genetic variants that each have subtle effects on T2DM prediction. These variants, in conjunction with clinical characteristics, led to greatly improved prediction accuracy (AUC 0.79) for new patients at the population level. The proposed method also has the advantage of computational efficiency, which is 20 times faster than Random Forest classifier, and thus provides a promising tool for large-scale genome-wide association studies.

Joint work with Luann C Jung at Massachusetts Institute of Technology, Xukun Li and Cen Wu at Kansas State University.

Location: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Add to Calendar

Add Event To My Group:

Please sign-in