Statistics Seminar Series (joint with Biostatistics): Qiwei Li, " Bayesian Modeling of Metagenomic Sequencing Data for Differential Abundance Analysis" 10/28/2020: Northwestern Events Calendar

Northwestern Events Calendar

Oct

2020

Statistics Seminar Series (joint with Biostatistics): Qiwei Li, " Bayesian Modeling of Metagenomic Sequencing Data for Differential Abundance Analysis"

When: Wednesday, October 28, 2020
11:00 AM - 12:00 PM CT

Where: Online

Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students

Contact: Kisa Kowal (847) 491-3974

Group: Department of Statistics and Data Science

Category: Academic, Lectures & Meetings

Description:

Department of Statistics 2020-2021 Seminar Series (joint with Biostatistics) - Fall 2020

"Bayesian Modeling of Metagenomic Sequencing Data for Differential Abundance Analysis"

Speaker: Qiwei Li, Assistant Professor of Statistics, Department of Mathematical Science, The University of Texas at Dallas

Abstract: Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the study of the microbiome. One of the most essential questions that can help us decipher the relationship between the microbiome and disease is how to identify differentially abundant taxonomic features across different populations. Metagenomics sequencing data are usually summarized into a high-dimensional count table, which suffers from sample heterogeneity, unknown mean-variance structure, and excess zeros. These characteristics often hamper downstream analysis and thus require specialized analytical models. In this paper, we propose a Bayesian bi-level framework to identify a set of differentially abundant taxa, which could potentially serve as microbial biomarkers for diagnosing diseases. The bottom-level is a multivariate count generative model that links the observed counts in each sample to their latent normalized abundances. For the choice of a zero-inflated negative binomial model as the bottom level, we use the Dirichlet process as a flexible nonparametric mixing distribution to model all latent factors that account for sample heterogeneity. The top-level is a Gaussian mixture model with a feature selection scheme for identifying those taxa whose normalized abundances are discriminatory between different phenotype groups. The model further employs Markov random field priors to incorporate taxonomic tree information to identify microbial biomarkers at different taxonomic ranks. A colorectal cancer case study demonstrates that a resulting diagnostic model trained by the microbial signatures identified by our model in a cohort can significantly improve the current predictive performance in another independent cohort. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies and elucidating disease etiology.

Add Event To My Group:

Please sign-in