XHTML 5, vnu and validation
Posted by mholmes on 01 Sep 2017 in Academic
Today I worked through a stack of issues in building and validating the site, and I now have some recommendations and insights worth recording.
First, I determined that vnu was parsing our documents as HTML because they had the .html extension. The HTML parser does a bunch of pre-validation things (like lower-casing custom data attributes) which we would prefer to avoid. I also discovered that using the XHTML output method in Saxon was paradoxically adding a meta tag to the header specifying content type as text/html, which was also pushing vnu into treating the documents as HTML rather than XHTML. Solutions:
- Use this for the xsl:output element:
<xsl:output method="xhtml" include-content-type="no" encoding="UTF-8" omit-xml-declaration="yes" exclude-result-prefixes="#all" normalization-form="NFC"/>
The method attribute gives you correct results in terms of not producing things like self-closed empty div tags. The include-content-type="no" value suppresses the unwanted meta tag with the wrong content type. - Do the HTML5 doctype like this:
<xsl:text disable-output-escaping="yes"><!DOCTYPE html> </xsl:text>
It's ugly but it works. - Always include the charset meta tag:
<meta charset="UTF-8"/>
- Before validating, copy only the HTML files to a fresh empty directory and validate them there. This is because of what is explained below.
- For validation using vnu.jar, use this command-line setting:
-Dnu.validator.client.content-type=application/xhtml+xml
In an ant task, it looks like this:<java jar="utilities/vnu/vnu.jar" failonerror="true" fork="true"> <arg value="-Dnu.validator.client.content-type=application/xhtml+xml"/> <arg value="--format text"/> <arg value="--skip-non-html"/> <arg value="tmpValidation/"/> </java>
The problem is that when you set the content type as in the first argument, the --skip-non-html flag no longer seems to work; it sets about validating every jpeg and javascript file in the tree. I think this must be a vnu bug, but I haven't tested thoroughly yet.
Following these steps should produce good XHTML5 (assuming your XSLT is right) and validate it as XHTML.
This entry was posted by Martin and filed under Academic.