BEGIN:VCALENDAR
CALSCALE:GREGORIAN
METHOD:PUBLISH
VERSION:2.0
PRODID:-//PlanIt Purple//EN
BEGIN:VEVENT
STATUS:CONFIRMED
LAST-MODIFIED:20081017T154401
URL:http://www.northwestern.edu/nico/events/index.html#seminars
PRIORITY:0
CLASS:PUBLIC
UID:357634@northwestern.edu
SUMMARY:Wednesdays @ NICO: Eduardo Altmann - Language & Social Behavior in Usenet Groups
DESCRIPTION:Eduardo G. Altmann\, NICO Abstract: Before the WWW\, blogs\, and IMs there were Usenet groups. This internet distributed discussion system has been used as a collective world-wide communication for the past three decades\, building a detailed database of the interaction between millions of users. In this talk I will discuss how questions from linguistics and social behavior can be studied using Usenet groups. I start with a general characterization of the groups  (e.g.\, the distribution of posts per user and posts per thread have heavy tail but the life-time of users decays exponentially). I then discuss how the time dependent frequency of usage of specific words can be used to quantify the popularity of words\, e.g.\, internet slangs\, products\, persons\, or events. In the main application discussed in this seminar\, I will take advantage of the amount of data available (~15 years and ~100\,000\,000 words in each group) to introduce a statistical characterization of words that goes beyond the frequency of usage. Based on the distance between successive occurrence of words\, I will show that different parts of speech (in the same frequency range) can have different statistical properties: while function words follow approximately a Poisson process\, content words consistently diverge from a Poisson process for both short and long distances. This motivates the definition of the area A between the measured and the Poisson distributions as a characteristic of the word usage in each group. Words with large A provide a good characterization of the discussion topics of the group.    In terms of the different parts of speech\, we find the following order for decaying A: proper nouns\, common nouns\, adjectives\, verbs\, and prepositions. This suggests a connection between the A-score and the semantic content of the words. NICO Coffee Hour will follow for questions\, networking and collaboration. http://www.northwestern.edu/nico/events/index.html#seminars
DTSTART:20081022T120000
DTEND:20081022T140000
CREATED:20081009T000000
DTSTAMP:20081009T000000
SEQUENCE:0
LOCATION:Evanston
END:VEVENT
END:VCALENDAR