Pipelines

Two main pipelines are provided as examples for processing bibliography material - one for reference lists, and another for citations. We shall look briefly at these now.

Pipeline for Reference Lists

To process the example bibliography provided in $TTT/EGS/plain/test-biblio and pipe the output to an HTML file in the $TTT/OUTPUT/HTML directory, the following command can be used:

cat EGS/plain/test-biblio | runbiblio > OUTPUT/HTML/refs.html

The file runbiblio contains the main pipeline, of course:

SCRIPTS/bibplain2xml.perl \
| bin/fsgmatch -q ".*/TEXT" GRAM/char/bibparas.gr \
| SCRIPTS/openangle.perl \
| bin/fsgmatch -q ".*/P" GRAM/char/bibwords.gr \
| SCRIPTS/openangle.perl \
| bin/fsgmatch -q ".*/P" GRAM/sgml/pubrules.gr \
| bin/fsgmatch -max_pos 100 -q ".*/P" GRAM/sgml/refrules.gr \
| sgmltrans -r OUTPUT/SCRIPTS/bibtrans \
| OUTPUT/SCRIPTS/bibtrans2html.perl \

See the general section above (Pipelines) and the comments in the file itself for further information. Briefly, the input file is converted to simple XML, split into paragraphs, and then tokenised using $TTT/GRAM/char/bibwords.gr. Angle brackets must be explicitly converted during this stage as we are dealing with character level material. Subsequently the grammars are XML-level, and there are two passes through fsgmatch, as described above. The first identifies publication information and the second handles names, dates, and titles. Notice that the final call to fsgmatch expands the "max_pos" specification (the default is 50) in order to deal with titles which are over a certain length. The final elements in the pipeline convert the material to HTML in two stages, as discussed elsewhere.

Pipeline for Citations

The command which processes the test citation file is very similar to the reference list case:

cat EGS/plain/test-citations | runcitations > OUTPUT/HTML/cits.html

The pipeline in runcitations is:


SCRIPTS/bibplain2xml.perl \
| bin/fsgmatch -q ".*/TEXT" GRAM/char/bibparas.gr \
| SCRIPTS/openangle.perl \
| bin/fsgmatch -q ".*/P" GRAM/char/bibwords.gr \
| SCRIPTS/openangle.perl \
| bin/fsgmatch -q ".*/P" GRAM/sgml/citationrules.gr \
| sgmltrans -r OUTPUT/SCRIPTS/bibtrans \
| OUTPUT/SCRIPTS/bibtrans2html.perl \

The stages here are exactly the same as those in runbiblio with the exception that there is only one pass through fsgmatch after the tokenisation stage.