Adding Manual Constraints and Lexical Look-up to a Brill-Tagger for German.

Gerold Schneider and Martin Volk, University of Zurich

Published in ESSLLI-98 Workshop on Recent Advances in Corpus Annotation. Saarbrücken: 1998.

Abstract

We have trained the rule-based Brill-Tagger for German. In this paper we show how the tagging performance improves with increasing corpus size. Training over a corpus of only 28'500 words results in an error rate of around 5% for unseen text. In addition we demonstrate that the error rate can be reduced by looking up unknown words in an external lexicon, and by manually adding rules to the rule set that has been learned by the tagger. We thus obtain an error rate of 2.79% for the reference corpus to which the manual rules were tuned. For a second general reference corpus lexical-lookup and manual rules lead to an error rate of 4.13%.

Full version (compressed postscript)