Anton Ingason will give a tutorial on using annotated corpora for syntactic research on Thursday, April 25, from 2:30--5:00 in the Partee Room. The last hour will be a hands-on tutorial. A brief abstract follows.
Using Annotated Corpora for Syntactic Research
Parsed corpora are powerful tools for collecting quantiative evidence for syntactic research. This introduction to the use of parsed corpora will focus on the following:
- Appropriate and inappropriate uses of a parsed corpus
- Types of results that have been made possible by currently available corpora
- The annotation scheme of the Penn Parsed Corpora for Historical English (which is being applied to an increasing number of languages with minor modifications)
- How do you run your own queries?
For the last and most important part, we will use the Icelandic Parsed Historical Corpus (IcePaHC), which is freely available. You do not need to know Icelandic for the purposes of the tutorial! It would be useful to download the corpus and set it up on your laptop in advance.
Download IcePaHC 0.9 from here (Windows or Platform Independent):
http://www.linguist.is/icelandic_treebank/Download
The Windows version has an automatic installer. The platform independent version (Mac OS or Linux) requires you to have Java installed and open a terminal to run a command from the directory where you save the corpus:
java -jar corpald-icepahc-0.9.jar
Useful materials:
CorpusSearch Users Guide (especially "Query Language" section):
- http://corpussearch.sourceforge.net/CS-manual/Contents.html