Work on TEI P4 to P5 default namespace problem
This work grows out of the conversions I've been doing for P4 TEI to P5, on several projects (including ScanCan, EMLS and ultimately ACH). The TEI provides some sample stylesheets which take an approach to conversion which keeps the output free of any namespacing until right at the end, when a final stylesheet attempts to add the namespace . I was having trouble with this stylesheet, written by Syd Bauman, and began working with him on developing a test case we can use to get some serious advice about the best approach. This morning I worked through some basic tests, and reported as follows to Syd:
I've been trying to figure this one out, and a core problem is that you can't create a valid TEI P5 document which links to a schema (XSD file) but is not already in a namespace. I've done that for the purposes of testing.
Here's the minimal document:
<?xml version="1.0" encoding="UTF-8"?>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Minimal test document</title>
</titleStmt>
<publicationStmt><p>Unpublished</p></publicationStmt>
<sourceDesc><p>This electronic file is the original document.</p></sourceDesc>
</fileDesc>
</teiHeader>
<text><body>
<head>Minimal test document</head>
<p>This is an absolute minimal test document for P5 XSLT processing.</p>
</body></text>
</TEI>
Here's the minimal stylesheet:
<?xml version="1.0"?>
<!-- One variation is to switch between version 2.0 and 1.0. -->
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.tei-c.org/ns/1.0">
<!-- Another variation is to uncomment the template below. -->
<!--
<xsl:template match="TEI">
<xsl:element name="TEI" namespace="http://www.tei-c.org/ns/1.0">
<xsl:apply-templates />
</xsl:element>
</xsl:template>
-->
<!-- XSLT Template to copy anything, priority="-1" -->
<xsl:template match="@*|node()|text()|comment()|processing-instruction()" priority="-1">
<xsl:copy>
<xsl:apply-templates select="@*|node()|text()|comment()|processing-instruction()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Here are my results, testing under oXygen:
1. With ONLY the copy-anything template:
-No xmlns attribute is added to the root under any of the following circumstances:
-XSLT 1.0 under Xalan, xsltproc, or Saxon 6.
-XSLT 2.0 under Saxon 8.
2. With the TEI match template enabled:
-XSLT 2.0 under Saxon 8: The xmlns attribute IS added to the root, but empty xmlns attributes are also added to its two child nodes (teiHeader and text).
-XSLT 1.0 under Saxon 6: Ditto.
-XSLT 1.0 under xsltproc: Ditto.
-XSLT 1.0 under Xalan: YES! "Correct" result; xmlns attributes is added to root, but NO empty xmlns attributes appear below.
So the situation seems to be that only with XSLT 1.0 under Xalan can we get the result we want, and we can only achieve that by matching the root node and adding a namespace attribute to the xsl:element tag.
Now we try using the apparently-wrong (according to our research) method, where the stylesheet looks like this:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="TEI">
<xsl:element name="TEI">
<xsl:attribute name="xmlns">http://www.tei-c.org/ns/1.0</xsl:attribute>
<xsl:apply-templates />
</xsl:element>
</xsl:template>
<!-- XSLT Template to copy anything, priority="-1" -->
<xsl:template match="@*|node()|text()|comment()|processing-instruction()" priority="-1">
<xsl:copy>
<xsl:apply-templates select="@*|node()|text()|comment()|processing-instruction()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Results:
-XSLT 2.0 under Saxon 8: Invalid XSLT -- xmlns is not allowed as an attribute name.
-XSLT 1.0 under Saxon 6: No attribute is added. File is unchanged.
-XSLT 1.0 under Xalan: Ditto.
-XSLT 1.0 under xsltproc: Namespace IS added.
So in this case, the only working setup is with xsltproc.
It seems to me there's no reliable way to do this right now, so practically speaking, perhaps the whole approach of generating elements not in a namespace and then trying to put them in a namespace at the end is, if not wrong, then impractical. Perhaps all the stylesheets should carry the xmlns attribute in their root elements just so it's always the default namespace for output right through the process. I haven't tested that, though.
I hope this helps. Let me know if you find any different results, or if the gurus can give you a straight answer about this!