Northwestern Events Calendar

Feb
24
2025

Zhihan Zhou Final Defense Monday, February 24th

When: Monday, February 24, 2025
3:30 PM - 5:00 PM CT

Where: Mudd Hall ( formerly Seeley G. Mudd Library), 3514, 2233 Tech Drive, Evanston, IL 60208 map it
Webcast Link (Hybrid)

Audience: Faculty/Staff - Student - Post Docs/Docs - Graduate Students

Contact: Wynante R Charles   (847) 467-8174

Group: Department of Computer Science (CS)

Category: Academic

Description:

Deciphering the Language of DNA with Genome Foundation Models

Abstract:

Deciphering the language of the genome is a fundamental challenge with transformative implications across multiple critical domains. This thesis aims to advance genome analysis and synthesis through the introduction of genome foundation models (gFMs)—general-purpose genomic models designed to address a broad spectrum of genomics and metagenomics problems. 

We first demonstrate the promise of self-supervised DNA modeling with DNABERT, the first gFM that outperforms traditional methods in diverse prediction tasks. Building on these insights, we introduce DNABERT-2, incorporating compact sequence representations, modern computational techniques, and multi-species genomes to enhance the applicability, efficiency, and effectiveness of discriminative gFMs.

Moving beyond analysis, we further explore generative modeling in genomics through GenomeOcean, a 4-billion-parameter generative gFM trained on massive metagenomic assemblies. GenomeOcean exhibits a profound understanding of protein functions and higher-order genomic functional modules by generating novel and realistic sequences under diverse prompts. Finally, we propose a generalizable framework for producing effective DNA embeddings tailored to biologically meaningful relationships.

Together, this new class of models opens new avenues and provides a robust foundation for advancing genomics and metagenomics research.

 

Add to Calendar

Add Event To My Group:

Please sign-in