11 October 2015

Call for papers: Dealing with Bad Data

Workshop at the Meertens Institute, Amsterdam (The Netherlands), March 17-19, 2016.

Call for Papers 

In recent years, linguistic theory has significantly expanded its empirical scope. If it has indeed ever been true that theories were built exclusively on the researcher’s own armchair judgements, such is definitely no longer the case. More and more, researchers have turned their attention to databases and corpora of all kinds, to experimental results, and to many other types of data sources. This development went hand in hand with an expansion of the scope of the theories, and collaborations with e.g. historical linguistics, dialectologists, sociolinguists, and psycholinguists. 

We believe that this is a positive development; but we also believe that some issues have not been sufficiently discussed. In this workshop we aim to tackle the issue of how to deal with ‘bad data’: many data that we have to deal with has not been collected with exactly the questions in mind that we want to ask. E.g., we have to use the results of a dialect survey of a few decades ago as the money is lacking to set up a new survey; or certain data are simply lacking for a particular historical period. 

We invite papers on all these issues. Which problems do you encounter in your work, and how do you solve them? Is there any privileged type of data to answer certain questions? Do we need more methodological standards and if so what should they look like? How can we make sure that we keep an integrated theory in which the results of different kinds of empirical explorations can all be accomodated? What is the relation between our methodological choices and central hypotheses of the theory of mental grammar? 

Sketch of the issues 

We distinguish between (at least) four classes of problems: (i) Incomplete data, (ii) Noisy data, (iii) One-sided data, (iv) Conflicting data.  

Ad i), As we mentioned above, both in historical and dialectological surveys, data from some period or some region may be missing, either because (in dialectology) we do not have any data for some area at all, or because for different areas there are different gaps in the data we have. Also, in a lot of typological work, ‘typological gaps’ have been taken as significant: if a certain phenomenon does not occur in any language then the theory should be restricted accordingly. This idea has come under attack: the languages we have actually studied in detail is probably not a representative sample of all languages in the world, and those in turn stand in an unknown relationship to all possible languages.

Ad ii), generative grammar (and many related types of theorizing) has been based on the division between I-language and E-language (or competence and performance), where the research object has been reduced to I-language, among other things for reasons of manageability: E-language is influenced by too many complicated factors. Although native speaker judgements can also not be said to be ‘pure’ reflections of I-language, it seems clear that the ‘new’ kinds of data indeed show the influences of many kinds of noise. 

Ad iii), in many cases, the data that are given do not show everything we need to know. For instance, if we study a historical corpus, we can only learn that certain constructions did occur, not whether or not the constructions that we do not find were ungrammatical, or just did not occur by accident. We would need to complement them by judgment data, but this is obviously lacking. Inversely, people sometimes demand that ‘just’ judgements are not enough and need to be complemented by e.g. Google data on actual occurrences.  

Ad iv): When we combine data from different types of data resources in our research, e.g. judgement data and language use data, these may show conflicting patterns. How do we resolve such conflicts? 

Invited speakers 

The following speakers have confirmed their participation: Paul de Lacy, Paula Fikkert, Paul Kiparsky, Cecilia Poletto, Keren Rice, Christina Tortora, Jeroen Van Craenenbroeck, Charles Yang.This list will hopefully be extended by a few more names. 

Submission of abstracts 

You can submit a 2-page abstract (excluding references) for a 30 minutes talk on EasyChair before December 1, 2015. You are allowed to submit multiple abstracts, but we may decide to only accept one abstract per author, and the choice will be ours.

Deadline of submission: December 1, 2015

Notification of acceptance: December 22, 2015

Website address: http://www.meertens.knaw.nl/baddata/