Converting and Annotating Quantitative Data Tables
Companies, governmental agencies and scientists produce a large amount of quantitative (research) data, consisting of measurements ranging from e.g. the surface temperatures of an ocean to the viscosity of a sample of mayonnaise. Such measurements are stored in tables in e.g. spreadsheet files and research reports. To integrate and reuse such data, it is necessary to have a semantic description of the data. However, the notation used is often ambiguous, making automatic interpretation and conversion to rdf or other suitable format difficult. For example, the table header cell ``f (Hz)'' refers to frequency measured in Hertz, but the symbol ``f'' can also refer to the unit farad or the quantities force or luminous flux. Current annotation tools for this task either work on less ambiguous data or perform a more limited task. We introduce new disambiguation strategies based on an ont, which allows to improve performance on ``sloppy'' datasets not yet targeted by existing systems.