After some investigation, I found that the best option for validating XHTML5 is not the RNC schemas from whattf, which depend on datatype definitions in their own namespace that I can't seem to find a jar implementation of; instead, the VNU project provides a single JAR file which can do it. I've integrated that into the project and I'm now doing validation of XHTML5 pages produced by the build.
Two sets of errors were easily fixed: first, the encoders seem to have been making up language codes as they went along, so there were hundreds of invented ones. I've fixed all those. Second, the files are not in Unicode NFC. I have XSLT to fix that in the XML, but I haven't run it yet; instead, I built normalization into the pre-processing of the XML, which solves the problem for the XHTML output, and also protects against bad data coming in in the future. But I will fix the normalization in the source XML files soon.
What's left is about 1100 errors of the expected type (divs inside spans and the like), each of which will have to be looked at in the hope that generic fixes that cover lots of them can be found. Meanwhile, validation of XHTML5 is now a solved problem, very usefully, and I'll be able to port this fix to other contexts. I don't know if I can find a way to make it validate all the fragment files, though. One option there would be to build them all into a single file, validate that, and then delete it.