Following a discussion on the TEI list, I've added a couple of pipelines to extract all the @rend attributes (which are now pure CSS, at least within the <text> element) and format them as a CSS stylesheet which can be passed to the Jigsaw validator, for checking.
The "corpus output" (a composite file consisting of a collection of individual contributions -- an issue, or an ad-hoc anthology) now has a complete <teiHeader> element, created by cherry-picking and combining various bits and pieces from the component documents. This is rather a complicated business, and is not always precisely what you want -- for instance, if you combine documents from two vol/issue sets, you'll get idno elements for each volume and issue, but no clear indication of which volume number goes with which issue number. Still thinking about that.
HB wrote to ask for a keystroke shortcut for "New (with same categories)", so I've added one. I also updated the version info and date on the Help.
Covering lates in Greg's absence, and going on to the TRUTH meeting at 7pm.
Met with MJ about the use of eXist for searching in the ISE project. I wrote some quite demo code for searching using eXist years ago, and it's been in the project ever since; MJ is planning a proper implementation, using two or possibly more versions of the text for different types of search.
Over the last couple of days, documents are reaching completion, and I've been revisiting the markup and rendering as I start looking at them on the site. After rewriting some of the XSLT and CSS, and normalizing some of the markup, I've sent this message to the team (reproduced here because it documents some policy changes):
I've started working on the rendering of the markup in the documents you've been working on, starting with Emily's text.
I came up against a number of issues which have led me to revise some of the markup Emily and I came up with on Tuesday, so I think we'll need to look again at Leanna's title page and frontispiece markup with these issues in mind. I've decided on a few policy changes:
@rend ATTRIBUTES ARE CSS:
When we're describing the rendering of textual elements, with the @rend attribute (<hi rend="xxx">), the value of the @rend attribute should always be CSS (Cascading Style Sheets) code. This is something that we're already doing in many cases, such as "text-indent: 3em" and "font-variant: small-caps", but in other cases we were using old TEI formulations. This means that you'll need to get familiar with some of the most common CSS code. Here are some simple examples:
- rend="font-size: 200%"
- rend="font-variant: small-caps;"
- rend="font-style: italic;"
- rend="font-weight: bold"
- rend="vertical-align: super;" (=superscript)
- rend="vertical-align: sub;" (=subscript)
- rend="text-indent: 3em;" (=first line indent in a para)
- rend="text-align: center;"
You can combine these together with semi-colons, like this:
<hi rend="font-style: italic; font-size: 200%;">
I think you'll get the idea fairly quickly, although this can get quite complicated -- for drop-capitals, I ended up with this:
rend="float: left; font-size: 200%; margin: 0; padding: 0; line-height: 90%;"
so feel free to ask me about this. There';s a good intro to CSS here.
It's widely used for styling Web pages, and in other contexts, so it's worth learning.
FIGURES:
Decorative figures, lines on the page, etc. (graphical elements in the text) are described like this:
<figure> <figDesc>Text describing the figure</figDesc> </figure>
There's no need to use square brackets in the <figDesc>; I supply those during the rendering process. The descriptive text should be in French, of course.
PAGE NUMBERS:
Generally page numbering is straightforward:
<pb n="25" />
but we've come across cases where it's not -- where the page number is wrong, for instance, because of a typesetting error. Up to now, those instances have been handled like this:
<pb n="146" rend="164" />
meaning "page number 146; rendered wrongly as 164". However, there's a long sequence of page number errors in Leanna's text which are caused by numbers 18 and 19 being repeated; this means that most of the page numbers are "wrong" in the text. This causes me to wonder whether page numbers in our texts are best viewed as "labels" (i.e. they should be reproduced in the rendering exactly as they are, without our worrying about what they "should be"), or whether they're structural data, and we should number the pages as we think they should be numbered, and provide a note to the effect that the original text has an "erroneous" number. Claire, what do you think?
LINE BREAKS:
In the prose novels, lines are incredibly short, and we've been (correctly) inserting all the appropriate line breaks as <lb/> tags. However, for reading on screen, this is frankly weird (see Emily's text on the Web site), so I'm wondering if the display of the novels should ignore line breaks, giving us a more "normal" rendering like the other prose texts:
What do you think? (We could also make this optional, allowing the reader to switch linebreaks on and off on the page.)
ANNOTATIONS ON THE DOCUMENT:
There are some instances of text written on the document by hand (such as the BL... number on the title page of Emily's text). I'm not sure yet how we should handle those, but if you've seen any in your own text, please let me know what they are, so I can get a better idea of how many and varied they are.
MF sent two reports of emails that were rejected. The first appears to be a case where the user submitted an erroneous return email address, and our system eventually gave up trying and reported that failure.
The second appears to be a case of someone spoofing the admin@ and webmaster@ accounts and somehow getting those addresses put on a blacklist somewhere. It seems to be a pretty localized thing, as I have no other problem reports (which I'd expect if suddenly those accounts were universally blacklisted, or suspended by the sys-admins)