Restricted information

WebExtrAns

WebExtrAns was a privately funded project that run from Nov. 1999 to Dec. 2002. Like ExtrAns, WebExtrAns was intended to test how far it is possible to go in the world of Answer Extraction. As the name suggests, Answer Extraction (AE) techniques attempt to extract the answers to the user query in a set of documents. AE is not question answering because it does not try to generate the answer from scratch. In other words, if a specific sentence in the documents directly answers a query, it is retrieved. But if the answer is not explicitly expressed in the document, an AE system will not try to infer it. AE is a specific type of information retrieval.

The domain of application of AE includes:

In all of these applications it is important to find all the answers to the question (high recall), since technical manuals generally explain things only once. It is important to find only the answers, without garbage (high precision), since the user is interested in getting an answer quickly. Achieving high recall and precision in small retrieved data (only sentences or part of sentences) will require a degree of natural language processing. ExtrAns and WebExtrAns aim at testing if it is possible to use current NLP technologies in AE over technical manuals.

By NLP technologies we mean, among others:

The data to use is a technical manual of a commercial aircraft, the AIRBUS 320, made available by SR Technics, a subsidiary within the SAirGroup (formerly Swissair group). This manual has the following characteristics:

WebExtrAns was a joint project between the University of Zurich (Switzerland) and the University of Tartu (Estonia) and was privately funded.


Example of Interaction with the system


Terminology

One of the main obstacles in processing technical manuals is the high amount of domain specific terminology. In the course of the projects we experimented with different tools for terminology extraction. We developed our own tools for structuring the terminology by synonymy and hyperonymy, helped by our own visualization tools.

Project results

The collapse of Swissair deprived us of our potential partner for a commercial exploitation of the results obtained in the project. Besides we could not perform a direct evaluation of usability with the target users of the system (Aircraft Maintenance Technicians).

However the project delivered interesting scientific results, as witnessed from the list of publications (see below). Although the original focus of the project was on the Answer Extraction problem, the nature of the documents to be analyzed (Aircraft Maintenance Manuals) brought us into the area of Terminology. We had to explore various Terminology Extraction techniques and find ways to exploit the extracted terminology within our NLP system.

A working prototype which shows the effectiveness of our Answer Extraction approach is internally available. Unfortunately it cannot be made available on the web because of unsolved copyright issues regarding the analyzed documents. However an earlier prototype targeted to a different domain can be accessed here.

We could summarize the main scientific results of our work as follows:


Researchers

University of Zurich

University of Tartu


Publications originated from the Project

[Rinaldi et al. 2004a]
Fabio Rinaldi, Michael Hess, James Dowdall, Diego Mollá, Rolf Schwitter. Question Answering in Terminology-rich Technical Domains, "New Directions in Question Answering", Maybury, M. T. editor. 2004. AAAI/MIT Press.

[Schwitter et al. 2004a]
Rolf Schwitter, Fabio Rinaldi, Simon Clematide. The Importance of How-Questions in Technical Domains. Question-Answering workshop of TALN 04, Fez, Morocco, 22nd April 2004.

[Mollá et al. 2003b]
Diego Mollá, Fabio Rinaldi, Rolf Schwitter, James Dowdall, Michael Hess. Answer Extraction from Technical Texts. IEEE Intelligent Systems, 18(4):12-17, July/August 2003.

[Mollá et al. 2003a]
Diego Mollá, Rolf Schwitter, Fabio Rinaldi, James Dowdall, Michael Hess. NLP for Answer Extraction in Technical Domains. Accepted for publication at the EACL 03 Workshop: Natural Language Processing for Question Answering, Budapest.

[Rinaldi et al. 2003b]
Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel Kaljurand, Magnus Karlsson. The role of technical Terminology in Question Answering. TIA 2003, Terminologie et Intelligence Artificielle, Strasbourg. Portable Document Format

[Rinaldi et al. 2003a]
Fabio Rinaldi, James Dowdall, Kaarel Kaljurand, Michael Hess and Diego Molla. Exploiting Paraphrases in a Question Answering System. ACL-2003, Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, pp.25-32. July 11th, Sapporo, Japan. Portable Document Format

[Rinaldi et al. 2002c]
Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel Kaljurand, Mare Koitand Neeme Kahusk: Terminology as Knowledge in Answer Extraction. TKE-2002: 6th International Conference on Terminology and Knowledge Engineering, 28th-30th August 2002 Nancy, France Portable Document Format

[Rinaldi et al. 2002b]
Fabio Rinaldi, James Dowdall, Michael Hess, Diego Molla and Rolf Schwitter: Towards Answer Extraction: An application to Technical Domains. ECAI-2002, Lyon, 21-26 July, 2002. In: F. van Harmelen (ed.), ECAI 2002. Proceedings of the 15th European Conference on Artificial Intelligence, IOS Press, Amsterdam, 2002. Portable Document Format

[Rinaldi et al. 2002a]
Fabio Rinaldi, Michael Hess, Diego Molla, Rolf Schwitter, James Dowdall, Gerold Schneider, and Rachel Fournier: Answer Extraction in Technical Domains. CICLing-2002, Mexico City, 17-23 February, 2002. Available from Springer Verlag: Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science. VOL. 2276., pg. 360-369.

[Hess et al. 2002]
Michael Hess, James Dowdall, Fabio Rinaldi: The Challenge of Technical Text. LREC-2002, Workshop on Question Answering: Strategy and Resources, Las Palmas, 28 May 2002.Portable Document Format

[Dowdall et al. 2002]
James Dowdall, Michael Hess, Neeme Kahusk, Kaarel Kaljurand, Mare Koit, Fabio Rinaldi and Kadri Vider: Technical Terminology as a Critical Resource. LREC-2002, Las Palmas, 29-31 May 2002. Portable Document Format

[Höfler 2002]
Stefan Höfler, Link2Tree: A Dependency-Constituency Converter. Lizentiatsarbeit der Philosophischen Fakultät der Universtität Zürich, April 2002.

Restricted information


Fabio Rinaldi (rinaldi@ifi.unizh.ch). Last update:

[CL group]