Within the UIS-project we build a University Information System. This will be a system answering questions about student registration at the University of Zurich. So far these questions have been compiled into a FAQ and into a glossary. The idea is similar to the ExtrAns-Project: We first process natural language text and represent it as logical formulae. We then allow the user to ask a natural language question, process it in the same manner and reason over the logical formulae to find the answer. In contrast to ExtrAns, the UIS-system deals with German rather than English. Since there are fewer NLP modules available for German than for English the goal of the UIS-project needs to be more modest than in ExtrAns. We will have to restrict the input language to an appropriate level.
A central module for the UIS-system is an efficient parser for German. This module has to process German sentences and deliver a functional structure for the translation into logic. The parsing module works in the following sequence. It gets a string and has the tokenizer split it up into separate units. A preprocessor then checks for special expressions like date, time or currency. The remaining words in the input are given to Gertwol, a commercial morphology analyzer, which provides part of speech and inflectional information for every wordform. Since Gertwol is purely wordform based it produces multiple readings for most wordforms. Therefore a tagger is used to disambiguate the Gertwol output before parsing starts.
Since Gertwol does not provide any valency information this must be extracted from other lexical sources. We are using the valency information from CELEX, a lexical database available on CD-ROM. The data provided by CELEX are a good starting point but they must be manually cleaned and extended since they do not include exact information on e.g. reflexivity or on prepositional complements. Words unknown to Gertwol are entered into a full-form lexicon.
The parser is a bottom-up chart parser for ID/LP- and PS-rules. In case of failure to parse the complete sentence it is possible to retrieve parsing fragments from the chart and to continue with them.
The translation into logical formulae starts with the functional structure delivered by the parser. It contains information about subject, objects and modifiers. These are translated, in a first step, as ``Quasi Logical Forms'' (i.e. representations without scope information for quantifying expressions and without resolving anaphoric references) and then as logical forms in a Horn Clause Logic based knowledge representation language.
The UIS-system is meant to be interfaced to an HTML page. In this way we ensure that it can be tested over the WWW in a real information retrieval situation. The dialogues will be saved and can be used to tune the system to the users' information needs.