Overall document structure
Posted by mholmes on 04 Dec 2009 in Hit by a bus
Overall document structure is probably best discovered simply by examining the existing files, but major markup features and practices are enumerated below.
<teiHeader>
and <front>
- Documents are named for author and volume, with an additional integer in the middle which serves to differentiate two documents when the same author has two documents in the same volume. For instance:
sayers_1_18.xml
means "Sayers's first document in volume 18". - The XML file name, minus extension, is also used as the
@id
attribute on the root<TEI.2>
tag (<TEI.2 id="sayers_1_18">
). - The root element may also carry an @rend="proof_only" attribute. When this is present, the document will not show up in the main TOC of the site; instead it will appear on the "Proofing" page, where documents are made available to editors and authors for proofing before publication. When a document is deemed published or fit for publication, its
@rend
attribute can be removed, promoting it to the main TOC. - The
<front>
element has<docTitle>
element that looks like this:<docTitle n="Norwave: Norwegian Cinema 1997-2006"> <titlePart type="Main">Norwave: Norwegian Cinema 1997-2006</titlePart> <titlePart type="Running">Norwave: Norwegian Cinema 1997-2006</titlePart> </docTitle>
The @n attribute provides a markup-free version of the title, which can be used in processing without fear that it contains tags (whereas the other titles may contain markup). The main title is used on the first page of the article, and the running title obviously at the head of pages in the PDF/print version; this is often shorter than the main title, and must be created in consultation with the editor, and tested to make sure it will fit in the output. - Following
<docTitle>
in<front>
, we find:<docAuthor><name key="holmes_martin" reg="Holmes, Martin">Martin Holmes</name> has been hanging around the University of Victoria for nearly fifteen years, doing various menial jobs. E-mail: <xptr to="my@e.mail" type="email"/>. </docAuthor>
This is the "long bio" which shows up on the website when you look at the Contributors page, or in a printed volume on its Contributors page. Note the need to supply the name in regularized format, and also in the form of a key. Throughout the header, the same formulation is used for a name. This is repetitive, but it makes for easy processing whenever a name is retrieved. - Finally, in front, we have a "short bio":
<titlePart type="short_affil">Martin Holmes is a layabout at the University of Victoria.</titlePart>
This is used at the bottom of the article title page (in the case of an article) or at the bottom of a review. NOTE: The format of this will differ in each case; for an article, it needs to be a full sentence, starting with the author's name, whereas for a review, it follows the author's name on the next line, and so it has to be a noun phrase. Look at existing markup for examples.
Structure of <body>
- The body is divided into
<div0>
elements, one for each major section. If there is only one section, there is only one<div0>
. - Each
<div0>
can have a single<head>
tag at the top of it, containing a heading for the section. - Content inside the
<div0>
tags, following any<head>
, consists mainly of<p>
(paragraph) tags. - The first letter of the first
<p>
of the first<div0>
is usually formatted as a large drop-capital in a fancy font for the print version. This is marked up thus:<p><hi rend="DropCap">A</hi>lthough its subtitle identifies it ...</p>
Where the article starts with an inline quotation, both the opening quote and the first letter are included in the drop-cap tag. This means that the opening quotation cannot be marked up as a quote in the normal way, which is unfortunate; actual (smart) quotes are used in this situation. Here aesthetics trumps coding rigour.
Structure of <back>
The <back>
element contains the bibliography (sometimes more than one, if they're divided into sections). The structure looks basically like this:
<back> <div type="Bibliography"> <head>REFERENCES</head> <listBibl> <biblStruct>...</biblStruct> </listBibl> </div> </back>
The content of <biblStruct>
s is complicated, and documented with lots of examples in its own blog posting.