Root-based index sort order
SK reported an issue with the sort order of entries in the root-based index. I dug into it, and discovered: The main Moses-to-English entries appear to be sorted in the correct order. First a sort key is created like this:
<xsl:variable name="sortKey" select="if (descendant::orth) then normalize-space(descendant::orth[1]) else normalize-space(string-join(for $s in descendant::pron[seg[@type='p']]/descendant::seg[@type='p'] return hcmc:createOrth($s), ''))"/>
In other words, if there's an orth it uses the orth, and if not, it creates an orth from all the descendant phonemic prons. Then it sorts the entries using the orthographic collation:
<xsl:sort select="@sortKey" collation="http://saxon.sf.net/collation?class=ca.uvic.hcmc.moses.MosesOrthographyCollation"/>
When it comes to processing the root-based index, we were doing something slightly different:
<xsl:sort select="if (descendant::orth) then descendant::orth[1] else hcmc:createOrth(descendant::pron[seg[@type='p']][1]/descendant::seg[@type='p'][1])" collation="http://saxon.sf.net/collation?class=ca.uvic.hcmc.moses.MosesPhonemicCollation"/>
In other words, we were using the Phonemic collation. I can't remember when/where/why we have both phonemic and orthographic collations -- there must have been a reason -- but I've now switched the root-based index sort so that it uses the orthographic one. That appears to fix the problem, but SK will check for any unwanted fallout.