Northwestern University

Feb
19
Mon 2:00 PM

EECS Seminar Speaker: Dr. Dong Deng, Postdoctoral Associate in the CS & AI Lab (CSAIL) at MIT, "Data Curation at Scale"

When: Monday, February 19, 2018
2:00 PM - 3:00 PM  

Where: Ford Motor Company Engineering Design Center, ITW Room, 2133 Sheridan Road, Evanston, IL 60208 map it

Audience: Faculty/Staff - Student - Public - Post Docs/Docs - Graduate Students

Contact: Brianna Mello  

Group: Electrical Engineering & Computer Science

Category: Lectures & Meetings

Description:

The EECS Department welcomes Dr. Dong Deng, Postdoctoral Associate in the CS & AI Lab (CSAIL) at MIT.

Deng will present a talk entitled "Data Curation at Scale" on Monday, February 19 at 2:00 PM in Ford ITW Room.

Abstract: Data curation (ingest, transformation, cleaning, schema mapping, deduplication, and consolidation) of raw data sets consumes up to 80% of a data scientist’s time. Integrating silos of enterprise data is also a major challenge to business users. To address these issues, we have built an end-to-end data curation system, Data Civilizer, in cooperation with the Qatar Computing Research Institute.
In this talk, I will start with a brief introduction to the Data Civilizer system. Then I will discuss two of the components that I have constructed. First, I will discuss entity consolidation in Data Civilizer. This module accepts a collection of clusters of records thought to represent the same entity (i.e. duplicates) and merges each cluster into a single “golden” record. Next, I will show how to address the key challenges to enable scalable entity matching in Data Civilizer. Finally, I will conclude the talk with my future vision on data curation for end-users, and massive data lake management.

Bio: Dr. Dong Deng  is a postdoctoral associate in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, where he works with Prof. Michael Stonebraker and Prof. Samuel Madden. He is interested in data management and data science, with a special focus on tackling the theoretical and system building challenges in data curation. Dong obtained his PhD degree from Tsinghua University in 2016 with the highest doctoral dissertation award. He also received scholarships from the Siebel Foundation, Google, Microsoft, Intel, and Boeing Company and has been regularly publishing in top venues including SIGMOD, PVLDB, and ICDE.

Hosted by CS Division

Add Event to Calendar

Add Event To My Group:

Please sign-in