In-Text Citations

As noted in the introductory remarks to this chapter, citations can be syntactic or parenthetic, and their structure can be fairly complicated. The following is a parenthetic citation with a good deal of what we have termed `extent' information which determines the exact position of the citation in the referenced work:

   (see, for instance, Hones et al., 1972. pp32-45; Izzo, 1975, chapter 4;
   Jablonka et al., 1975, sects iii and iv; Kelly et al., 1976; Mallery
   and Nerbonne, 1979, pp78ff; Oostendorp et al., 1984).

The citation grammar will structure this as follows:

 <CITATION>(see, for example,
   <AUTHOR>
     <NAME><SURNAME>Hones</SURNAME></NAME> <ETAL>et al</ETAL>.
   </AUTHOR>, 
    <DATE>1972</DATE>, 
    <EXTENT><PAGE>pp<RANGE>23-67</RANGE></PAGE></EXTENT>; 
   <AUTHOR>
     <NAME><SURNAME>Izzo</SURNAME></NAME>
   </AUTHOR>, 
    <DATE>1975</DATE>, 
    <EXTENT><CHAPTER>chapter <RANGE>4</RANGE></CHAPTER></EXTENT>;
   <AUTHOR>
     <NAME><SURNAME>Jablonka</SURNAME></NAME> 
           <ETAL>et al</ETAL>.
   </AUTHOR>, 
    <DATE>1975</DATE>, 
    <EXTENT><SECTION>sects <RANGE>iii</RANGE> and
           <RANGE>iv</RANGE>,</SECTION></EXTENT> 
    <DATE>1978</DATE>, 
    <EXTENT><PAGE>p.<RANGE>79</RANGE></PAGE></EXTENT>; 
   <AUTHOR>
    <NAME><SURNAME>Kelly</SURNAME></NAME> 
          <ETAL>et al</ETAL>.
   </AUTHOR>,
    <DATE>1976</DATE>; 
   <AUTHOR>
     <NAME><SURNAME>Mallery</SURNAME></NAME> and 
     <NAME><SURNAME>Nerbonne</SURNAME></NAME>
   </AUTHOR>,
    <DATE>1979</DATE>, 
    <EXTENT><PAGE>pp<RANGE>78ff</RANGE></PAGE></EXTENT>; 
   <AUTHOR>
     <NAME><SURNAME>Oostendorp</SURNAME></NAME> 
           <ETAL>et al</ETAL>.
   </AUTHOR>, 
    <DATE>1984</DATE>)
 </CITATION>

It is of course arguable that this should be structured differently. We assume that the required structure depends on the task in hand and that this is just an example of the kind of detail which can be annotated if required. The default grammar in $TTT/GRAM/sgml/citationrules.gr assumes that there are two main types of citation, as noted, and that the syntactic type must have bracketed dates:

<RULE	name="syntactic" targ="&A-REW; &B-VAL; &C-REW;">
  <REL	type="REF"	match="author"	  var="A"> </REL>
  <REL                  match="&COMMA;" m_mod="QUEST" var="B"> </REL>
  <REL	type="REF"	match="bracketed_date"      var="C"> </REL>
</RULE>

Parenthetic citations, on the other hand, must have brackets around the whole string:

<RULE	name="parenthetic" targ="&A-VAL; &B-VAL; &C-VAL; &D-VAL;
                                 &E-REW; &F-VAL; ">
  <REL                  match="&LPAR;" var="A" > </REL>
  <REL	type="REF"	match="prestring" m_mod="QUEST" var="B"> </REL>
  <REL                  match="&FSTOP;" m_mod="QUEST" var="C"> </REL>
  <REL                  match="W[C='CM']" m_mod="QUEST" var="D"> </REL>
  <REL	type="REF"	match="author_date" m_mod="PLUS" var="E"> </REL>
  <REL                  match="&RPAR;" var="F" > </REL>
</RULE>

Apart from the brackets and the optional punctuation, the main contents are the author and the date as defined by the "author_date" rule. The "prestring" rule looks in the lexicon for a restricted set of introductory strings such as "for example", "cf", "see", and so on.

The author information in citations is very similar to the analysis of proper names in reference lists - at least, in the default case. We shall see below how to use a pre-processed reference list to replace some of the name rules. The main distinction is that the rules allow the string "et al" to appear with a name, wrapping an ETAL tag around the string. The date information, however, is more complicated due to the potential for including extent data with the date, as we have seen. The following two subsections look in more detail at the analysis of dates and extents.

Dates in citations

Both syntactic and parenthetic citations allow any number of date and extent pairs to follow an author name. The main distinction is that the default syntactic case is that all the date-extent information is necessarily bracketed in the syntactic case, but not in the parenthetic. The following examples illustrate the distinction:

 (Hones et al., 1972. pp32-45; 1975, chapter 4)
 (see Hones et al [1972. pp32-45; 1975, chapter 4])
 Hones et al. (1972. pp32-45; 1975, chapter 4)

The first and second examples are parenthetic, the dates being optionally bracketed. The date information is handled by the following rule:

<RULE	name="simple_date" type="DISJF" targ="&S-REW;">
  <REL	type="REF"  match="simple_date_numbers" > </REL>
  <REL	type="REF"  match="date_words" > </REL>
</RULE>

As with dates in the reference list, these disjuncts handle numerical dates and date strings like "to appear" and "in press", the latter being defined in the lexicon. As before, the DATE tag is wrapped around the match. The numerical dates differ slightly from the reference list versions in that any number of date letters can appear with a citation, as in "Smith 1967a,b,c". This is handled straightforwardly by the following rules:

<RULE	name="date_letters" targ_sg="DATE_LETTER" targ="&S-VAL;">
  <REL	match="W/#~^[a-z]$"> </REL>
</RULE>

<RULE	name="comma_date_letters" targ="&A-VAL; &B-REW;">
  <REL                match="&COMMA;" m_mod="QUEST" var="A"> </REL>
  <REL	type="REF"  match="date_letters" m_mod="STAR" var="B"> </REL>
</RULE>

<RULE  nme="simple_date_numbers" targ_sg="DATE" targ="&A-VAL; &B-REW;">
  <REL	match="W/#~^[12][0-9][0-9][0-9]$" var="A"> </REL>
  <REL	type="REF"  match="comma_date_letters" m_mod="STAR" var="B"> </REL>
</RULE>

The date numbers are simple "1" or "2" followed by three digits, and then zero or more "comma_date_letters" elements. These in turn allow an optional comma in front of each date letter, and wrap the DATE_LETTER tag around the letter itself.

It may be worth noting here that the form of these rules is more awkward than it might be in order to get the DATE_LETTER tag in place. As mentioned in the general introduction, SGML rules can only wrap the target tag around whole match, so in cases such as this where we would like a sub-part of the object to be tagged, it is necessary to include a separate rule (here the "date_letters" rule) to get the required result. The actual markup for "1990a,b" produced by these rules is:

  <DATE>
    1990
    <DATE_LETTER>a</DATE_LETTER>,
    <DATE_LETTER>b</DATE_LETTER>
  </DATE>

If we only wanted to tag "1990a,b" as a date, with no internal markup, the three rules above could be reduced to just two, and note how the target specifications are simpler:

<RULE   name="date_letters" targ="&S-VAL;">
  <REL                match="&COMMA;" m_mod="QUEST"> </REL>
  <REL  match="W/#~^[abcdefghijklmnopq]$"> </REL>
</RULE>

<RULE   name="simple_date_numbers" targ_sg="DATE" targ="&S-VAL;">
  <REL  match="W/#~^[12][0-9][0-9][0-9]$"> </REL>
  <REL  type="REF"  match="date_letters" m_mod="STAR"> </REL>
</RULE>

The result in this case would simply be:

  <DATE>1990a,b</DATE>

Extent information

Extents, as we have seen, carry information about the particular part of a reference which is being cited, as in "Ormond (1978, pp45ff)". There are various forms of these, and in order to mark each explicitly, we use a separate rule for each possibility:

<RULE	name="extent" type="DISJF" targ_sg="EXTENT" targ="&S-REW;">
  <REL	type="REF"  match="chap_ex_numbers" > </REL>
  <REL	type="REF"  match="sec_ex_numbers" > </REL>
  <REL	type="REF"  match="part_ex_numbers" > </REL>
  <REL	type="REF"  match="page_ex_numbers" > </REL>
  <REL	type="REF"  match="ex_numbers" > </REL>
</RULE>

We are thus looking specifically for extent information which is introduced by words such as "chapter", "page", "Sec", and suchlike, and the EXTENT tag will be wrapped around everything. The final rule just allows numbers without one of the strings, as in "Lewin (1976:45)". In this case the exact nature of the extent information is unclear - as it must be, of course. Taking the "page_ex_numbers" rule as an example, this is defined as:

<RULE	name="page_ex_numbers" targ_sg="PAGE" targ="&A-VAL; &B-VAL;
                                                       &C-REW;  ">
  <REL              match="W/#~^[Pp]s?|[Pp][Pp]s?|[Pp]ages?$" var="A" > </REL>
  <REL              match="W[C='FS']" m_mod="QUEST" var="B" > </REL>
  <REL	type="REF"  match="ex_numbers" var="C" > </REL>
</RULE>

Here we have used a regular expression to pick out things like "pp", Page", and so forth. The "ex_numbers" rule then handles the positional information:

<RULE	name="ex_numbers" targ="&A-REW; &B-VAL; &C-REW; &D-VAL; &E-VAL; ">
  <REL    type="REF"  match="ex_number_comma"  m_mod="STAR" var="A"> </REL>
  <REL                match="&AND;" m_mod="QUEST" var="B" > </REL>
  <REL    type="REF"  match="ex_number" var="C"> </REL>
  <REL                match="&FSTOP;" m_mod="QUEST" var="D"> </REL>
  <REL                match="W[C='CM']" m_mod="QUEST" var="E"> </REL>
</RULE>

Note that any number of numbers in the extent expressions is possible - this will allow things like "Collins (1967, pages 23, 56, 68, and 89)". The actual numbers themselves are expressed in much the same way as the page range information is handled in reference lists, one obvious distinction being that we allow roman numerals in the extents to account for things like "Walker (1988, Chs I and II)". The roman numerals themselves are just listed in the lexicon - it would be an interesting little exercise to write a grammar for them.