WebExtrAns

WebExtrAns was a privately funded project that run from Nov. 1999 to Dec. 2002. Like ExtrAns, WebExtrAns was intended to test how far it is possible to go in the world of Answer Extraction. As the name suggests, Answer Extraction (AE) techniques attempt to extract the answers to the user query in a set of documents. AE is not question answering because it does not try to generate the answer from scratch. In other words, if a specific sentence in the documents directly answers a query, it is retrieved. But if the answer is not explicitly expressed in the document, an AE system will not try to infer it. AE is a specific type of information retrieval.

The domain of application of AE includes:

Interfaces to machine-readable technical manuals.
On-line help systems for complex software.
Help desk systems in large organisations.
Public inquiry systems over the Internet.

In all of these applications it is important to find all the answers to the question (high recall), since technical manuals generally explain things only once. It is important to find only the answers, without garbage (high precision), since the user is interested in getting an answer quickly. Achieving high recall and precision in small retrieved data (only sentences or part of sentences) will require a degree of natural language processing. ExtrAns and WebExtrAns aim at testing if it is possible to use current NLP technologies in AE over technical manuals.

By NLP technologies we mean, among others:

Full parsing of the sentences.
Disambiguation.
Anaphora resolution.
Construction of a Minimal Logical Form (MLF).

The data to use is a technical manual of a commercial aircraft, the AIRBUS 320, made available by SR Technics, a subsidiary within the SAirGroup (formerly Swissair group). This manual has the following characteristics:

The size is over 100 Mb, far larger than ExtrAns'manpages.
The format is SGML. This allows us to use SGML/XML tools and build a system that is more portable than ExtrAns.
The English used in the data is defined by AECMA' Simplified English (SE). The use of documents in SE simplifies some problems related with NLP, such as lexical and syntactic ambiguity, anaphora resolution, ellipsis, and tense. But the hard problems are still there in practically the same degree: presuppositions, quantification, aspect, lexical semantics, etc.

WebExtrAns was a joint project between the University of Zurich (Switzerland) and the University of Tartu (Estonia) and was privately funded.

Example of Interaction with the system

Terminology

One of the main obstacles in processing technical manuals is the high amount of domain specific terminology. In the course of the projects we experimented with different tools for terminology extraction. We developed our own tools for structuring the terminology by synonymy and hyperonymy, helped by our own visualization tools.

Project results

The collapse of Swissair deprived us of our potential partner for a commercial exploitation of the results obtained in the project. Besides we could not perform a direct evaluation of usability with the target users of the system (Aircraft Maintenance Technicians).

However the project delivered interesting scientific results, as witnessed from the list of publications (see below). Although the original focus of the project was on the Answer Extraction problem, the nature of the documents to be analyzed (Aircraft Maintenance Manuals) brought us into the area of Terminology. We had to explore various Terminology Extraction techniques and find ways to exploit the extracted terminology within our NLP system.

A working prototype which shows the effectiveness of our Answer Extraction approach is internally available. Unfortunately it cannot be made available on the web because of unsolved copyright issues regarding the analyzed documents. However an earlier prototype targeted to a different domain can be accessed here.

We could summarize the main scientific results of our work as follows:

Terminology plays a central role in the processing of Technical Manuals
The complexity of parsing technical manuals can be ascribed to a large part (46% in our case) to terminology.
Terminological Variants needs to be taken into account, as effective standardization is still not completely achieved
Even if complete standardization was achieved within the manuals, the user of a query system could come up with a novel variant
We implemented a prototype showing effective ways to deal with existing and novel variants

Researchers

University of Zurich

Michael Hess, project director
hess@ifi.unizh.ch
Diego Mollá(Nov.1999 - Apr.2000)
molla@ifi.unizh.ch
Fabio Rinaldi (Jan. 2000 - Dec. 2002)
rinaldi@ifi.unizh.ch
Rolf Schwitter (Nov.1999 - Nov.2000)
schwitt@ifi.unizh.ch
James Dowdall (May 2001 - Dec. 2002)
dowdall@ifi.unizh.ch

University of Tartu

Mare Koit, coordinator in Tartu
koit@cs.ut.ee
Kadri Vider
kvider@psych.ut.ee
Kaarel Kaljurand
kaarel@ut.ee
Neeme Kahusk
nkahusk@psych.ut.ee

[Rinaldi et al. 2004a]: Fabio Rinaldi, Michael Hess, James Dowdall, Diego Mollá, Rolf Schwitter. Question Answering in Terminology-rich Technical Domains, "New Directions in Question Answering", Maybury, M. T. editor. 2004. AAAI/MIT Press.
[Schwitter et al. 2004a]: Rolf Schwitter, Fabio Rinaldi, Simon Clematide. The Importance of How-Questions in Technical Domains. Question-Answering workshop of TALN 04, Fez, Morocco, 22nd April 2004.
[Mollá et al. 2003b]: Diego Mollá, Fabio Rinaldi, Rolf Schwitter, James Dowdall, Michael Hess. Answer Extraction from Technical Texts. IEEE Intelligent Systems, 18(4):12-17, July/August 2003.
[Mollá et al. 2003a]: Diego Mollá, Rolf Schwitter, Fabio Rinaldi, James Dowdall, Michael Hess. NLP for Answer Extraction in Technical Domains. Accepted for publication at the EACL 03 Workshop: Natural Language Processing for Question Answering, Budapest.
[Rinaldi et al. 2003b]: Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel Kaljurand, Magnus Karlsson. The role of technical Terminology in Question Answering. TIA 2003, Terminologie et Intelligence Artificielle, Strasbourg.
[Rinaldi et al. 2003a]: Fabio Rinaldi, James Dowdall, Kaarel Kaljurand, Michael Hess and Diego Molla. Exploiting Paraphrases in a Question Answering System. ACL-2003, Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, pp.25-32. July 11th, Sapporo, Japan.
[Rinaldi et al. 2002c]: Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel Kaljurand, Mare Koitand Neeme Kahusk: Terminology as Knowledge in Answer Extraction. TKE-2002: 6th International Conference on Terminology and Knowledge Engineering, 28th-30th August 2002 Nancy, France
[Rinaldi et al. 2002b]: Fabio Rinaldi, James Dowdall, Michael Hess, Diego Molla and Rolf Schwitter: Towards Answer Extraction: An application to Technical Domains. ECAI-2002, Lyon, 21-26 July, 2002. In: F. van Harmelen (ed.), ECAI 2002. Proceedings of the 15th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2002.
[Rinaldi et al. 2002a]: Fabio Rinaldi, Michael Hess, Diego Molla, Rolf Schwitter, James Dowdall, Gerold Schneider, and Rachel Fournier: Answer Extraction in Technical Domains. CICLing-2002, Mexico City, 17-23 February, 2002. Available from Springer Verlag: Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science. VOL. 2276., pg. 360-369.
[Hess et al. 2002]: Michael Hess, James Dowdall, Fabio Rinaldi: The Challenge of Technical Text. LREC-2002, Workshop on Question Answering: Strategy and Resources, Las Palmas, 28 May 2002.
[Dowdall et al. 2002]: James Dowdall, Michael Hess, Neeme Kahusk, Kaarel Kaljurand, Mare Koit, Fabio Rinaldi and Kadri Vider: Technical Terminology as a Critical Resource. LREC-2002, Las Palmas, 29-31 May 2002.
[Höfler 2002]: Stefan Höfler, Link2Tree: A Dependency-Constituency Converter. Lizentiatsarbeit der Philosophischen Fakultät der Universtität Zürich, April 2002.

Restricted information

Fabio Rinaldi (rinaldi@ifi.unizh.ch). Last update:

WebExtrAns

Example of Interaction with the system

Terminology

Project results

Researchers

Publications originated from the Project