CL-LogoTagger for Estonian

During the Second Swiss Estonian Workshop on Computational and Theoretical Linguistics one of the project groups trained a tagger for Estonian. It is based on Brill's Tagger. The tagset (developed at the workshop) consists of 63 tags (13 for punctuation). The tagger has been trained on a manually tagged/corrected corpus of estonian fiction consisting of about 7'000 words.

The tagger is in an initial experimental state and can be tested here.


Enter your Estonian sentence here:

Enter diacritics directly (ä, ö, ü, õ, š, ž)
or in prefix-like notation ("a=ä, "o=ö, "u=ü, ~o=õ, ^s=š, ^z=ž).


University Department CL group
Authors: Gerold Schneider (gschneid@ifi.unizh.ch) and Beat Vontobel ( bvontob@cl.unizh.ch)
Date of last modification: