Treebanks: Formats, Tools and Usage

a short course at the University of Zurich

June/July, 2005

by Martin Volk (Prof. of Computational Linguistics, Stockholm University)

Contents

Treebanks have become valuable resources in natural language processing (NLP) in recent years. A treebank is a collection of syntactically annotated sentences in which the annotation has been manually checked so that the treebank can serve as a training corpus for natural language parsers, as a repository for linguistic research, or as an evaluation corpus for NLP systems. The course will introduce the processes involved in creating and exploiting treebanks. We will give an overview of the annotation formats in different treebanks (e.g. the English Penn Treebank, the Swedish Treebank SynTag, the German TIGER Treebank, the Danish Dependency Treebank, etc.). We will demonstrate some tools used for the creation of treebanks (tree editors), for consistency checking in treebanks and for treebank searches. And we will look into the many usages of treebanks ranging from machine learning to system evaluation. The course will conclude with a view into the future of treebanks, in particular parallel treebanks.

Intended Participants

The course is intended for advanced students in Computational Linguistics, Linguistics or some related discipline. It will be especially rewarding for students interested in modern annotation methods in Corpus Linguistics.

Language

The course will be taught in German (with slides in English). The exercises will be using examples from German and English.

Schedule

  Time / Room Topic Resources
Monday, 27. June
SOE-F-1
Treebank Definition and Overview (Annotation Formats), Treebank Tools (Editors, Search Tools)  
Tuesday, 28. June Computer room G-28 am IFI Exercise: Treebank Annotation using ANNOTATE ANNOTATE quick reference guide TIGER Annotationsschema
Thursday, 30. June
SOE-F-1
Treebank Usage (Training a Parser on a Treebank), Treebanks in the Future (Parallel Treebanks)  
Friday, 1. July Computer room G-28 am IFI Exercise: Evaluating a Chunker or Parser against a Treebank  

 


Last modified: 30. June 2005