I've gone through Amboise 1 to add all the remaining <fw>
elements (catchwords and sigs), fix some missing line-breaks, remove spaces before linebreaks (so that removal of hyphenation works correctly), add the last couple of elements (the "Fin..." line and the library stamp on the last page), and a bit of other tidying up. Amboise is now "complete" (meaning it needs to be proofed against its web view and PDF).
Category: "Activity log"
As we get towards the end of the markup of the first novel, I've worked carefully through the Amboise vol 2 again, and done a number of things:
- Automated the markup of page numbers and running titles, based on the old page numbers in
@n
attributes of<pb>
tags. This will save LW a lot of time. I've also run the same code on the vol 1, to help TG. - Manually marked up the remainder of the "sig" and catchword strings in vol 2, partly to get it finished, but mainly to shake down and normalize all the relevant markup practices. They're detailed below.
- Fixed a few oddities and typos.
- Added more CSS and XSLT updates, mainly to normalize long s characters to regular s in the "continuous view" (which is becoming the "more modern view"), and to remove trailing hyphens and leading spaces from linebreaks. There are 998 instances of words hyphenated across linebreaks in the text, and a quick scan through showed none that I could see that ought to retain their hyphens, so I've gone for a global solution which may result in a few hyphens disappearing where they ought not to, but which is certainly better than a hyphen+space interrupting a word 998 times.
These are some standard formats I've used for the forme works, included here for documentation and reference purposes:
- Catchwords in these texts are always bottom-right, so they're marked up as follows, following a regular line break:
<fw type="catchword" place="bot-right" rend="float: right;">bligea</fw><lb/>
The@place
attribute is only there for tradition, really; the whole rendering instruction is in the@rend
attribute. - Signature labels (type 1): There are two types of signature numbers, one of which is simply the number in roman numerals (this is the only kind which appears in volume one). Again, these follow the last line break:
<fw type="sig" place="bot-right" rend="float: right; margin-right: 3em;">A iiij</fw>
As above, the@place
attribute is really superfluous. The@rend
attribute captures the fact that the sig identifier appears floated right, but is slightly indented from the right margin. - Signature labels (type 2): The second type of signature marks the beginning of a signature, and is more complex; it appears only in volume 2:
<fw type="sig" place="bot-center" rend="text-align: center; margin-left: auto; margin-right: auto;"><hi rend="font-style: italic;">II. Part.</hi><space quantity="4" unit="em"/>B</fw>
The whole (treated as one line) is roughly centered, but the two components are separated by a space of about 4 ems (all measurement is done in ems, for simple scalability). The margin settings are there as a conventional way to express the fact that the block is not full-width, and is located in the centre (and this CSS can be passed straight to the browser in the rendering code, to get the effect we want). - Page numbers: Whether right or left (recto or verso respectively), these come first after the page break. That has two advantages: first, we know where they are reliably, programmatically, and second, they will render correctly floated, alongside the centred running title:
<fw type="pageNum" place="top-left" rend="float: left;">18</fw>
- Running titles: These are always encoded after the page number, even if the page number is on the right, for the reasons stated above:
<fw type="head" place="top-centre" rend="text-align: center; margin-left: auto; margin-right: auto;"><hi rend="font-style: italic;">Le Comte</hi></fw>
As with other<fw>
tags, the@rend
is the key attribute, expressing the fact that this is a part-width, centred block that renders on the same line as the floated page number.
Added a new "Petits romans" menu item, resulting in a contents page for the novels. Then started hacking more seriously at the prose display, both the page-based and continuous modes. I came across a validation problem, caused by <div>
elements ending up inside <h2>
tags due to TEI <fw>
tags appearing inside <head>
s in the HTML source. I've added some testing for this kind of condition in the XSLT, so that <span>
s are used in this kind of context, with the class
attribute invoking CSS which displays them as blocks anyway, so that the rendering is not affected; the result is that the XHTML validates, but the page still looks right.
I also came across a slightly thorny problem worth blogging. Paragraphs in the novels have text-indent
settings, specified in the XML and passed into the CSS. When block-display elements such as <fw>
tags, resulting in page numbers, occur within the paragraph (as they almost always do), the block element inherits the text-indent
setting from the parent, and so indents its text. This is avoided by specifically setting the text-indent value to zero on the classes of these block elements.
Finally, I tweaked the right margin of the continuous-view texts so that there's enough space for the note popup to appear. This makes the lines shorter anyway, which makes reading easier.
I still don't have a definitive solution to the hyphens indicating word-breaks across lines, which should be eliminated in the continuous view. Still thinking about that one.
TG has got through the full text of the first volume of Amboise, and is now working on the forme works etc., so I added it into the database to see what it would look like. As a result, I made a few changes to the text itself (mainly rearranging bits of the front matter, which is longer and more complex than the front matter in volume 2), and also revised some of the rendering code to get more consistent results.
I think we might now consider the possibility of placing images of the figures on the site, and weaving them into the text, instead of just rendering the descriptions we currently have. We could capture the figures from the original PDFs, on a white background.
The novel text we have so far (Amboise) has very short lines on very small pages. We are accurately reproducing that layout of the document, but it's not ideal for screen reading, so I wanted to provide another option for readers who want to see a continuous text.
I've implemented a parameter switch called proseView
, which changes some of the XSLT processing to remove forme works, page breaks, page numbers, and linebreaks inside paragraphs. I've also added a switching mechanism, in the form of a pair of links, at the top of the page. This code is only invoked if the <classCode>
contains "prose" or "roman"; it's ignored for verse. That might change in future -- although verse does need its linebreaks, it might be beneficial to be able to turn off the page breaks. The switch has text in English right now; I'm waiting for the appropriate French labels.
EM has started work on adding reference material for all the classical allusions etc. in the Sonnet (1621 version). I've created a file called references.xml, which EM is editing; it has a separate <div>
for each "topic", with a unique @xml:id
attribute. References in the text are linked like this:
<ref type="reference" target="references.xml#topicId">blah</ref>
Once we have a fully marked-up document, I'll be able to write the handling code (intercepting handling of existing <ref>
elements, which have no @type
attribute, and which handle cross-references in editorial notes.
Our RAs are beginning to mark up forme works (running titles, page and folio numbers, etc.), so I'm adding in some handling for those on the site. I've taken the opportunity to do some more thinking with regard to the integration of CSS. These are some points, exemplified in the Amboise 2 text:
<fw>
tags for page numbers had best precede those for running titles, even if they appear on the right of them, because they're essentially floated, while the running titles are centred. This is an easy convention to work with, and makes rendering much easier too.- The dimensions of the actual pages, rendered in ems, are now stored in a
<rendition>
element in the header:<tagsDecl> <rendition xml:id="layoutDesc" scheme="css">width: 15em; padding-left: 3em; padding-right: 3em;</rendition> <rendition xml:id="pageBreakMargins" scheme="css">margin-left: -3em; margin-right: -3em;</rendition> </tagsDecl>
- These are linked in a general-purpose way like this:
<text rendition="#layoutDesc">...</text>
and in a specific way for the page-break elements (this is rather a kludge, but it allows us to put page-break lines which are the right width):<xsl:template match="pb"> <xsl:element name="div"> <xsl:attribute name="class">pageBreak</xsl:attribute> <xsl:if test="//rendition[@xml:id='pageBreakMargins']"> <xsl:attribute name="style"><xsl:value-of select="//rendition[@xml:id='pageBreakMargins']" /></xsl:attribute> </xsl:if> <xsl:value-of select="@n" /> </xsl:element> </xsl:template>
There may be a much more generic way of doing this, but I haven't thought of it yet. I could have a convention that links a specific<rendition>
element through itsxml:id
to all instances of a particular tag. <fw>
tags are styled and positioned with@rend
. We'll probably dispense with@place
.
This is still very much a work in progress, but we're getting there. Another couple of days should result in some clear guidelines and usable output formats.
Three of our RAs were working today, and we worked out a lot of issues that relate to forme work. Page numbers will be marked up using <fw>
tags, along with running headers, and printer's marks such as signatures at the bottom of the page. Arguments -- topic labels in the margin -- will be done with <argument>
, at the beginning of the line group to which they apply, and positioned with CSS relative to that location.
After some discussion with PG and with the TEI list, it seems that we should be doing this:
- Marking page breaks using a plain
<pb>
tag. - Marking up the folio numbers (they're not page numbers if they cover a folio) using
<fw type="folioNum">
29</fw>
. - Since we'll now be marking up "forme works", we should probably also be marking up the running titles using teh same tag. That can be largely automated.
- Handling the printers' marks at the bottom left of folios in the same way.
Following a discussion on the TEI list, I've added a couple of pipelines to extract all the @rend
attributes (which are now pure CSS, at least within the <text>
element) and format them as a CSS stylesheet which can be passed to the Jigsaw validator, for checking.