Brendan O'Connor, CMU, will give a talk today, February 10, at
4 PM in Computer Science Building 150 entitled “Statistical Text Analysis for Social Science.” An abstract follows.
What can text analysis tell us about society? Corpora of news, books, and social media encode human beliefs and culture. But it is impossible for a researcher to read all of today's rapidly growing text archives. My research develops statistical text analysis methods that measure social phenomena from textual content, especially in news and social media data. For example: How do changes to public opinion appear in microblogs? What topics get censored in the Chinese Internet? What character archetypes recur in movie plots? How do geography and ethnicity affect the diffusion of new language? In order to answer these questions effectively, we must apply and develop scientific methods in statistics, computation, and linguistics.
In this talk I will illustrate these methods in a project that
analyzes events in international politics. Political scientists are
interested in studying international relations through *event data*:
time series records of who did what to whom, as described in news
articles. To address this event extraction problem, we develop an
unsupervised Bayesian model of semantic event classes, which learns the verbs and textual descriptions that correspond to types of
diplomatic and military interactions between countries. The model
uses dynamic logistic normal priors to drive the learning of semantic
classes; but unlike a topic model, it leverages deeper linguistic
analysis of syntactic argument structure. Using a corpus of several
million news articles over 15 years, we quantitatively evaluate how
well its event types match ones defined by experts in previous work,
and how well its inferences about countries correspond to real-world
conflict. The method also supports exploratory analysis; for example,
of the recent history of Israeli-Palestinian relations.