Tweaked the XSLT as suggested in the previous post, and the Sonnet XHTML output documents now have no validation errors. That's removed over 2000 errors from the 1609 :-). The output is still a bit weird -- margins seem to keep incrementing down through the document -- but it'll be easier to track that down and figure it out now.
The two editions of the Sonnet present quite complicated rendering problems for our site, because their page structure and line-group structure overlap. Specifically, we have line-groups running over many pages, and so the conventional way we're handling verse lines for the shorter poems will simply not work (we're using <ul>
and <li>
elements, but the <fw>
tags which appear in between lines at page boundaries are block-level elements and need to be rendered using XHTMKL <div>
tags, which cannot appear between the <li>
elements inside a single <ul>
).
This needs to be carefully handled to avoid disrupting the display of the shorter, simpler poems already on the site. These are my preliminary thoughts on how we might do this:
- Detect the problem when matching
<lg>
and<l>
elements. Specifically, note when<lg>
has a child<fw>
, or<li>
has a parent<lg>
with a child<fw>
. It might even be better to detect this structure's existence anywhere in the document, because that should dictate that all line groups and lines be handled in the same way throughout (we don't want some handled one way and some another, because some accidentally happen not to contain<fw>
tags).
So we could set a single variable at the beginning of the rendering process, based on the existence of an<fw>
tag anywhere in the<text>
element which has a parent<lg>
. - Where this situation is detected, branch the handling of
<lg>
tags and<l>
tags, so that<lg>
becomes a<div class="lineGroup">
, and<l>
becomes a<span class="verseLine">
followed by<br />
. - Provide appropriate CSS so that these render correctly.
I'm going to start work on that today.
Meeting with CC, EM and LCC about summer work schedules, material for the DHSI workshop they'll be doing, etc. EM is still working steadily on the annotations; LCC will finish the Varin transcription any minute now, and will start work full-time on the markup next week -- I've set aside some time to get her started. More details soon...
PG hit a markup issue: the 1609 Sonnet document has two distinct works in it, and the second of them has as elaborate a title page as the first, occurring in the middle of the document. However, the <titlePage>
tag only seems to be able to appear in a <front>
tag, which cannot appear after a <body>
. I think the obvious solution is shown here, and looks like this:
<text> <front/> <group> <text> <front/> <body/> <back/> </text> <text/> </group> </text>
The frontispiece/title page material that applies to the whole document comes first, in the root text, and then the two component documents are enclosed in a <group>
tag, each being a full <text>
element.
All the 2004 symposium lectures and their media are now complete. Only four more years to go (2005-2008).
The two issues described here have now been fixed. There are still many issues with the two documents, some of which require PG and LSW to fix markup inconsistencies, but some of which (including many listed in PG's email of March 27) require intervention from me and/or decisions from CC. In particular, we must set the document page widths appropriately for the non-continuous rendering, in the <teiHeader>
. I don't want to start on this till I have completed markup from PG and LSW, though; the 1609 doc is currently invalid and ill-formed, so it needs fixing anyway before I can upload recent changes.
This post describes the problem I worked on this morning. It turned out to be much more complicated than I'd anticipated. It wasn't an XSLT problem at heart; it was an XQuery issue. The root of it was this: During the process of creating a list of documents for the contents pages, the XQuery makes copies of the <biblStruct>
and <bibl>
elements inside the <sourceDesc>
of each document. However, in the process, it needs to inject some extra attributes into them, so it can't just copy them to the output; it has to reconstruct them. However, the bibl element content is mixed content (or at least, it was till today) when titles and so on are mentioned. There's no easy way to copy mixed content in a reliable way; you end up getting all the elements followed by all the text, or the elements don't get through at all, or any number of failure patterns depending on the strategy you take. In the end, I decided on a compromise solution: where the <bibl>
element contains mixed content, that content should be wrapped in a single <note>
element, so that it's copied intact to the output.
Having fixed the output of the 1621 Sonnet document by the addition of a <note>
element inside the <bibl>
, I then discovered that the XSLT in index.xsl
was not expecting any mixed content, so it just copied text to the output on the page. Once I changed the value-of
to an apply-templates
, I found that there weren't actually any templates for title elements at all (in this case, I needed monograph titles to appear in italics), so I added those to teiGeneral.xsl
, along with appropriate classes in the CSS file mariage_layout_typography.xsl
.
The difficulties thrown up by what looks like it should be a simple processing issue are typical of a project which has tended to grow organically rather than being planned at full scale before the markup was done. But that's the kind of project we have, for better or worse.
Fixed a couple of typos at CC's request in the Sonnet files, and in the process included her short descriptions of the texts, which go into <bibl>
elements in the <sourceDesc>
. This threw up a new problem. Up to now, these descriptive bits have been one-line plain text items, so the handling code is not expecting anything like titles or italicized text; these new descriptions include text titles that should be italicized. I've had to remove that from the content, until I can rewrite that code. Adding this as a task...
PG and LW are now comparing and proofing their markup, and as part of the process I've posted the Sonnet texts online. There are many display issues -- most are certainly caused by me (in other words, they're caused by my not having written the required display code for features of these texts which didn't appear in other texts). Some, though, might be due to markup issues, so the RAs can fix anything that's obvious as part of their proofing, and can also flag the other issues. Here are some I already know about, for the record:
- If you go to the 1609 text and search for "Mercure , penſé", you'll see that the following line, which has an
<unclear>
tag in the XML, just appears as if it were normal; in other words, nothing signals the "unclear" status of the text. This is my fault, because we hadn't used<unclear>
before so I haven't written any handler for it. - I'm displaying the folio page numbers when I shouldn't be -- only page numbers in
<fw>
tags should be displayed, not those in<pb/>
s.
Including this as a task, so I remember to get around to working on these specific items.
As suggested in the previous post, I've now made ids unique across the two documents, rather than just the one.