WebExtrAns was a privately funded project that run from Nov. 1999 to Dec. 2002. Like ExtrAns, WebExtrAns was intended to test how far it is possible to go in the world of Answer Extraction. As the name suggests, Answer Extraction (AE) techniques attempt to extract the answers to the user query in a set of documents. AE is not question answering because it does not try to generate the answer from scratch. In other words, if a specific sentence in the documents directly answers a query, it is retrieved. But if the answer is not explicitly expressed in the document, an AE system will not try to infer it. AE is a specific type of information retrieval.

The domain of application of AE includes:

In all of these applications it is important to find all the answers to the question (high recall), since technical manuals generally explain things only once. It is important to find only the answers, without garbage (high precision), since the user is interested in getting an answer quickly. Achieving high recall and precision in small retrieved data (only sentences or part of sentences) will require a degree of natural language processing. ExtrAns and WebExtrAns aim at testing if it is possible to use current NLP technologies in AE over technical manuals.

By NLP technologies we mean, among others:

The data to use is a technical manual of a commercial aircraft, the AIRBUS 320, made available by SR Technics, a subsidiary within the SAirGroup (formerly Swissair group). This manual has the following characteristics:

WebExtrAns was a joint project between the University of Zurich (Switzerland) and the University of Tartu (Estonia) and was privately funded.

Example of Interaction with the system


One of the main obstacles in processing technical manuals is the high amount of domain specific terminology. In the course of the projects we experimented with different tools for terminology extraction. We developed our own tools for structuring the terminology by synonymy and hyperonymy, helped by our own visualization tools.

Project results

The collapse of Swissair deprived us of our potential partner for a commercial exploitation of the results obtained in the project. Besides we could not perform a direct evaluation of usability with the target users of the system (Aircraft Maintenance Technicians).

However the project delivered interesting scientific results, as witnessed from the list of publications (see below). Although the original focus of the project was on the Answer Extraction problem, the nature of the documents to be analyzed (Aircraft Maintenance Manuals) brought us into the area of Terminology. We had to explore various Terminology Extraction techniques and find ways to exploit the extracted terminology within our NLP system.

A working prototype which shows the effectiveness of our Answer Extraction approach is internally available. Unfortunately it cannot be made available on the web because of unsolved copyright issues regarding the analyzed documents. However an earlier prototype targeted to a different domain can be accessed here.

We could summarize the main scientific results of our work as follows:


