Progress on the sort comparator
Today I learned some Java, which is pretty much new to me. I had to implement and test a Java class which implements the java.util.Comparator interface, and which can then be used as a sort of plug-in to Saxon, invoked from XSLT, to do custom sorting. I downloaded and installed Eclipse, and set myself up with a new package, including a source file for the class and one for a JUnit test class for trying it out.
With lots of help from Stew, I eventually got the class working, based on a list of all the characters in a simple string. The first useful discovery was the Java Normalizer class; this can be used to solve the problem of sorting strings which may contain pre-composed characters or strings of char+combining characters, which are equivalent. The Normalizer can be used to do a canonical decomposition of the strings before comparing them. Very handy -- and it might also be handy for normalizing actual data permanently at some point.
Testing of the results of sorting revealed that my initial assumption -- that putting the diacritics etc. at the beginning was wrong; to get the desired behaviour, they actually need to be at the end. That was easily fixed.
Once the class was working, we started trying to test it. The main requirement is that it be invoked using a URI, in a manner which is implementation-dependent. Our intention is to use it with Saxon 8, and the instructions for this are here. The code looks like this:
<xsl:sort select="tei:form[1]/tei:pron[1]/tei:seg[1]" collation="http://saxon.sf.net/collation?class=MosesSortComparator" />
Next, you have to put the class somewhere on the Java classpath, so it can be found by Saxon. We presume this means that it should go in with the other Java libraries in Cocoon, so I generated a JAR file (File / Export in Eclipse), and added it to the other JAR files on the server, in /usr/local/apache-tomcat-6.0.2/webapps/cocoon/WEB-INF/lib.
Initial testing failed, and I was puzzled, so I went back to the sitemap and discovered that although the file was XSLT 2.0, it was being run through the default XSLT processor, which is Xalan. When I changed the sitemap to call the Saxon processor, I got no results at all (an empty page). This was the case both with and without the new comparator being used, so the problem isn't the comparator; the stylesheet is not written correctly for Saxon, so we'll need to rewrite it before we can see if the sort actually works. That's for tomorrow.