Skip to main content

Zhihan Zhou Final Defense Monday, February 24th

Monday, February 24, 2025 | 3:30 PM - 5:00 PM CT
Mudd Hall ( formerly Seeley G. Mudd Library), 3514, 2233 Tech Drive, Evanston, IL 60208 map it
Webcast Link (Hybrid)

Deciphering the Language of DNA with Genome Foundation Models

Abstract:

Deciphering the language of the genome is a fundamental challenge with transformative implications across multiple critical domains. This thesis aims to advance genome analysis and synthesis through the introduction of genome foundation models (gFMs)—general-purpose genomic models designed to address a broad spectrum of genomics and metagenomics problems. 

We first demonstrate the promise of self-supervised DNA modeling with DNABERT, the first gFM that outperforms traditional methods in diverse prediction tasks. Building on these insights, we introduce DNABERT-2, incorporating compact sequence representations, modern computational techniques, and multi-species genomes to enhance the applicability, efficiency, and effectiveness of discriminative gFMs.

Moving beyond analysis, we further explore generative modeling in genomics through GenomeOcean, a 4-billion-parameter generative gFM trained on massive metagenomic assemblies. GenomeOcean exhibits a profound understanding of protein functions and higher-order genomic functional modules by generating novel and realistic sequences under diverse prompts. Finally, we propose a generalizable framework for producing effective DNA embeddings tailored to biologically meaningful relationships.

Together, this new class of models opens new avenues and provides a robust foundation for advancing genomics and metagenomics research.

 

Audience

  • Faculty/Staff
  • Student
  • Post Docs/Docs
  • Graduate Students

Contact

Wynante R Charles
(847) 467-8174
Email

Interest

  • Academic (general)

Add Event To My Group

Please sign-in