a short course at the University of Zurich
by Martin Volk (Prof. of Computational Linguistics, Stockholm University)
Treebanks have become valuable resources in natural language processing (NLP) in recent years. A treebank is a collection of syntactically annotated sentences in which the annotation has been manually checked so that the treebank can serve as a training corpus for natural language parsers, as a repository for linguistic research, or as an evaluation corpus for NLP systems. The course will introduce the processes involved in creating and exploiting treebanks. We will give an overview of the annotation formats in different treebanks (e.g. the English Penn Treebank, the Swedish Treebank SynTag, the German TIGER Treebank, the Danish Dependency Treebank, etc.). We will demonstrate some tools used for the creation of treebanks (tree editors), for consistency checking in treebanks and for treebank searches. And we will look into the many usages of treebanks ranging from machine learning to system evaluation. The course will conclude with a view into the future of treebanks, in particular parallel treebanks.
The course is intended for advanced students in Computational Linguistics, Linguistics or some related discipline. It will be especially rewarding for students interested in modern annotation methods in Corpus Linguistics.
The course will be taught in German (with slides in English). The exercises will be using examples from German and English.
Time / Room | Topic | Resources | |
Monday, 27. June |
SOE-F-1 |
Treebank Definition and Overview (Annotation Formats), Treebank Tools (Editors, Search Tools) | |
Tuesday, 28. June | Computer room G-28 am IFI | Exercise: Treebank Annotation using ANNOTATE | ANNOTATE quick reference guide TIGER Annotationsschema |
Thursday, 30. June |
SOE-F-1 |
Treebank Usage (Training a Parser on a Treebank), Treebanks in the Future (Parallel Treebanks) | |
Friday, 1. July | Computer room G-28 am IFI | Exercise: Evaluating a Chunker or Parser against a Treebank |
Last modified: 30. June 2005