The next incarnation of our project Le mariage sous L'Ancien Régime is now live at mariage.uvic.ca. It's still officially a beta version, but all the pieces are in place.
Today I worked through a stack of issues in building and validating the site, and I now have some recommendations and insights worth recording.
First, I determined that vnu was parsing our documents as HTML because they had the .html extension. The HTML parser does a bunch of pre-validation things (like lower-casing custom data attributes) which we would prefer to avoid. I also discovered that using the XHTML output method in Saxon was paradoxically adding a meta tag to the header specifying content type as text/html, which was also pushing vnu into treating the documents as HTML rather than XHTML. Solutions:
- Use this for the xsl:output element:
<xsl:output method="xhtml" include-content-type="no" encoding="UTF-8" omit-xml-declaration="yes" exclude-result-prefixes="#all" normalization-form="NFC"/>
The method attribute gives you correct results in terms of not producing things like self-closed empty div tags. The include-content-type="no" value suppresses the unwanted meta tag with the wrong content type. - Do the HTML5 doctype like this:
<xsl:text disable-output-escaping="yes"><!DOCTYPE html> </xsl:text>
It's ugly but it works. - Always include the charset meta tag:
<meta charset="UTF-8"/>
- Before validating, copy only the HTML files to a fresh empty directory and validate them there. This is because of what is explained below.
- For validation using vnu.jar, use this command-line setting:
-Dnu.validator.client.content-type=application/xhtml+xml
In an ant task, it looks like this:<java jar="utilities/vnu/vnu.jar" failonerror="true" fork="true"> <arg value="-Dnu.validator.client.content-type=application/xhtml+xml"/> <arg value="--format text"/> <arg value="--skip-non-html"/> <arg value="tmpValidation/"/> </java>
The problem is that when you set the content type as in the first argument, the --skip-non-html flag no longer seems to work; it sets about validating every jpeg and javascript file in the tree. I think this must be a vnu bug, but I haven't tested thoroughly yet.
Following these steps should produce good XHTML5 (assuming your XSLT is right) and validate it as XHTML.
Somehow the 1609 Sonnet file got completely trashed sometime in July; it's as if huge blocks of the XML were moved around randomly, with no attention paid to the hierarchy, and it's now completely invalid. It looks unfixable to me. CC agreed on resetting to the last known good rev 1770.
Just to bring it up to date. Noticed that eXide seems to be broken on Chromium. Seems to work OK on FF.
Got EC set up and working with svn, and doing basic linking from the VT doc to the references. Worked with CC to do a bunch of layout fixes for the VT text, which is looking a lot better; we found a block of badly-encoded rubbish right at the end, though, which we started on, and which CC will finish.
CC provided a list of rendering issues in a lot of documents, which I've worked through. In the process, I've fixed some XSLT and CSS, abstracted some rendering rules into rendition elements for the longer documents, and done a lot of clean-up. All reported problems are I think fixed, but there are more to come. There's also more rationalization of styles that could be done on Espines and Maladies.
Added handlers for marginal labels in the normalized text (they show up always on the left, leaving the page numbers unencumbered on the right). Standardized all the marg labels in the Ville Thierry, and fixed a bunch of issues with untyped fws in Le Bon Mariage. Tested a freshly-built version of the site in the brand-new eXist 3.3.
Got EC set up and working on names/refs; next week we'll start in on XML with her. Discussed the possible release schedule (next week?) with CC; we need a visible "beta" label first. Fixed a bug in sorting of references (articles not being ignored).
EC is joining the team. Set up access to svn, and a time/place for initial training next week.
Using the real copy of Le Bon Mariage, I've checked the TOCs I created and made some adjustments, as well as fixing a pile of other style problems with marginal labels and forme works. That one is looking pretty good now.