Today I worked through a stack of issues in building and validating the site, and I now have some recommendations and insights worth recording.
First, I determined that vnu was parsing our documents as HTML because they had the .html extension. The HTML parser does a bunch of pre-validation things (like lower-casing custom data attributes) which we would prefer to avoid. I also discovered that using the XHTML output method in Saxon was paradoxically adding a meta tag to the header specifying content type as text/html, which was also pushing vnu into treating the documents as HTML rather than XHTML. Solutions:
- Use this for the xsl:output element:
<xsl:output method="xhtml" include-content-type="no" encoding="UTF-8" omit-xml-declaration="yes" exclude-result-prefixes="#all" normalization-form="NFC"/>The method attribute gives you correct results in terms of not producing things like self-closed empty div tags. The include-content-type="no" value suppresses the unwanted meta tag with the wrong content type.
- Do the HTML5 doctype like this:
<xsl:text disable-output-escaping="yes"><!DOCTYPE html> </xsl:text>It's ugly but it works.
- Always include the charset meta tag:
- Before validating, copy only the HTML files to a fresh empty directory and validate them there. This is because of what is explained below.
- For validation using vnu.jar, use this command-line setting:
-Dnu.validator.client.content-type=application/xhtml+xmlIn an ant task, it looks like this:
Following these steps should produce good XHTML5 (assuming your XSLT is right) and validate it as XHTML.