The TIMEX Grammar

The TIMEX grammar also relies on input from the numbers grammar for date expressions involving numbers such as the following:

  <TIMEX TYPE='DATE'><W C='W'>July</W> <W C='CD'>31</W><W C='CM'>,</W> 
  <W C='CD'>1989</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='CD'>400</W> <W C='W'>years</W> <W C='W'>ago</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='ORD'>first</W> <W C='CD'>six</W> 
  <W C='W'>months</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='ORD'>third</W><W C='DASH'>-</W>
  <W C='W'>quarter</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='CD'>1986</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>fiscal</W> <W C='CD'>1990</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>Dec.</W> <W C='CD'>6</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>end</W> <W C='W'>of</W> <W C='CD'>1973</W></TIMEX>

Certain dates are very regular and are comparatively easy to recognise with a high degree of certainty. These include conventional date notations for dates as well as full date expressions:

  <TIMEX TYPE='DATE'><W C='NUM'>15/12/62</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='NUM'>12/15/62</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='NUM'>3-9-58</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='NUM'>3-9-1958</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='NUM'>03-9-58</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='ORD'>25th</W> <W C='W'>August</W> 
  <W C='CD'>1997</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>Aug</W> <W C='CD'>25</W> <W C='CD'>1997</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>Monday</W> <W C='ORD'>25th</W> <W C='W'>August</W> 
  <W C='CD'>1997</W></TIMEX>

Certain underspecified date expressions are harder to recognise completely accurately. For example, certain month names occurring without any other date elements could potentially be non-months, especially in contexts where capitalisation does not clarify matters:

  March would be a good time.
  March forward with a happy heart!
  MAY MEETING CONFIRMED. 
  PRESIDENT MAY ATTEND.

In these cases, the TIMEX rules attempt to take context into account, but accurate performance is not guaranteed.

Four digit numbers starting with "19" are usually years, but again context must be used since they could just be numbers:

  In 1994 people were happy.
  1994 people attended.
  After 1994 people were happy.
  After 1994 people died, the product was recalled.

Using the preceding word as context, the TIMEX grammar successfully identifies the first and the third occurrences of "1994" as dates and it doesn't recognise the second occurrence as a date. However, it mistakenly identifies "1994" in the fourth example as a date, since the relevant context check is satisified if the preceding word is a certain kind of preposition.

In addition to date expressions which denote specific temporal locations, certain relative date expressions are also recognised. For example, the MUC-7 guidelines require that expressions involving "ago", "later", "before", "next" etc. must usually be recognised:

  <TIMEX TYPE='DATE'><W C='CD'>Three</W> <W C='W'>years</W> <W C='W'>ago</W>
  </TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>A</W> <W C='W'>week</W> <W C='W'>ago</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>a</W> <W C='W'>decade</W> <W C='W'>ago</W></TIMEX>
  <TIMEX TYPE='DATE'><PHR C='QUANT'><W C='W'>several</W></PHR> <W C='W'>years</W> 
  <W C='W'>ago</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>a</W> <W C='W'>year</W> <W C='W'>earlier</W></TIMEX>
  <TIMEX TYPE='DATE'><PHR C='RANGE'><W C='CD'>15</W> <W C='W'>or</W> 
  <W C='CD'>20</W></PHR> <W C='W'>years</W> <W C='W'>later</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>next</W> <W C='CD'>three</W> <W C='W'>years</W></TIMEX>

The MUC-7 guidelines also require that certain date expressions involving prepositions like "before" and "after" should include the complement of the preposition, whether it is a noun phrase or a full sentence. Thus the following examples should be marked up as indicated:

  <TIMEX TYPE='DATE'>Three years after his death</TIMEX>
  <TIMEX TYPE='DATE'>Three years after the plane exploded and crashed into
  the sea</TIMEX>.

However, it is a full parsing problem to determine the exact extent of the complement of a temporal preposition, and therefore we did not attempt to recognise such dates in the TIMEX grammar. In the following examples, on the other hand, we were able to identify the date since the punctuation/word following the temporal preposition indicates that it has no complement:

  He died <TIMEX TYPE='DATE'>three years after</TIMEX>.
  He had visited <TIMEX TYPE='DATE'>three years before</TIMEX>, it was reported.
  He died <TIMEX TYPE='DATE'>three years after</TIMEX> and his wife remarried.

The TIMEX grammar recognises times as well as dates, as in the following examples:

  <TIMEX TYPE='TIME'><W C='CD'>9</W> <W C='W'>o'clock</W></TIMEX>
  <TIMEX TYPE='TIME'><W C='CD'>4</W><W C='CM'>:</W><W C='CD'>15</W> 
  <W C='W'>p.m.</W></TIMEX>
  <TIMEX TYPE='TIME'><W C='CD'>4</W><W C='CM'>:</W><W C='CD'>15</W> 
  <W C='W'>p.m.</W><W C='.'></W> <W C='W'>Tuesday</W> <W C='W'>local</W> 
  <W C='W'>time</W></TIMEX>
  <TIMEX TYPE='TIME'><W C='W'>late</W> <W C='W'>next</W> <W C='W'>day</W></TIMEX>
  <TIMEX TYPE='TIME'><W C='W'>early</W> <W C='W'>next</W> 
  <W C='W'>morning</W></TIMEX>

In examples like "Wednesday morning", the day name and the day part must be separately marked as date and time respectively:

  <TIMEX TYPE='DATE'><W C='W'>Wednesday</W></TIMEX> 
  <TIMEX TYPE='TIME'><W C='W'>morning</W></TIMEX>

The date rules consult a lexicon, $TTT/LEX/timex.lex, and this contains lists of words such as day and month names etc. Certain date expressions denote special holiday periods, and for these we included in the lexicon a list of holidays derived from a variety of sources on the web. Although the list is incomplete and somewhat inaccurate, it does allow us to recognise dates such as the following:

  <TIMEX TYPE='DATE'><W C='W'>Christmas</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>New</W> <W C='W'>Year</W><W C='W'>'s</W> 
  <W C='W'>Day</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>St.</W> <W C='W'>Patrick</W><W C='W'>'s</W>
  <W C='W'>Day</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>Mother</W><W C='W'>'s</W> <W C='W'>Day</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>Rosh</W> <W C='W'>Hashanah</W></TIMEX>
  <TIMEX TYPE='DATE'><W C='W'>Burns</W> <W C='W'>Night</W></TIMEX>
  <TIMEX TYPE='DATE'>
  <W C='W'>Sham</W> <W C='W'>al</W><W C='DASH'>-</W><W C='W'>Naseem</W></TIMEX>
  <TIMEX TYPE='DATE'>
  <W C='W'>Shichi</W><W C='DASH'>-</W><W C='W'>go</W><W C='DASH'>-</W><W C='W'>san</W>
  </TIMEX>

The TIMEX grammar contains many more rules than are described here. We have endeavoured to make the comments in the grammar file as useful as possible and we hope that users will be able to explore the grammar for themselves without too much difficulty.