The NUMEX Grammar

After the numbers grammar has applied, numerical expressions will be marked as either W or PHR elements whose C attribute's value is one of CD, ORD, FRAC, FRACORD. Respectively, these are cardinals, ordinals, fractions and elements which may be ordinals or parts of fractions. The following are some sample strings marked up by the numbers grammar.

  <PHR C='CD'><W C='W'>a</W> <W C='W'>dozen</W></PHR>
  <W C='CD'>15</W>
  <W C='CD'>145,954</W>
  <W C='CD'>1992</W>
  <W C='CD'>5.93</W>
  <W C='ORD'>50th</W>
  <W C='ORD'>first</W>
  <W C='FRACORD'>ninth</W>
  <PHR C='CD'><W C='CD'>144.5</W> <W C='W'>million</W></PHR>
  <PHR C='CD'><W C='CD'>2.5</W> <W C='W'>billion</W></PHR>
  <PHR C='CD'><W C='CD'>34</W> <W C='FRAC'>3/4</W></PHR>
  <PHR C='FRAC'><W C='CD'>two</W><W C='DASH'>-</W><W C='W'>thirds</W></PHR>

The NUMEX grammar looks for monetary expressions or percentages. Percentages are simply made up of a cardinal number word or phrase followed by the word "percent" or the percent symbol. Monetary expressions are a combination of a cardinal number word or phrase with a currency expression occurring either to the right or the left of the cardinal number. The following are some sample strings marked up by the NUMEX grammar.

  <NUMEX TYPE='PERCENT'><PHR C='CD'><W C='CD'>4</W> <W C='FRAC'>1/2</W>
  </PHR><W C='W'>%</W></NUMEX>
  <NUMEX TYPE='PERCENT'><W C='CD'>20</W><W C='W'>%</W></NUMEX>
  <NUMEX TYPE='PERCENT'><W C='CD'>5.9</W><W C='W'>%</W></NUMEX>
  <NUMEX TYPE='MONEY'><W C='W'>$</W><W C='CD'>366.85</W></NUMEX>
  <NUMEX TYPE='MONEY'><W C='CD'>141.93</W> <W C='W'>yen</W></NUMEX>
  <NUMEX TYPE='MONEY'><W C='W'>$</W><PHR C='CD'><W C='CD'>102</W> 
  <W C='W'>million</W></PHR></NUMEX>
  <NUMEX TYPE='MONEY'><W C='CD'>89</W> <W C='W'>cents</W></NUMEX>

The grammar consults a lexicon, $TTT/LEX/numex.lex in order to recognise currency names. The entries in this lexicon were derived from a variety of sources available on the web. The currency list is not complete and probably not completely accurate. Moreover, certain currency unit names are potentially ambiguous, e.g. "mark" and "pound", and we have restricted their lexical entries in order not to overgenerate. A more context-sensitive approach might allow such expressions to be reliably disambiguated.