SMK and I worked hard on the secondary sorting today, and we have a working system for the lexical suffix group, which is our core test. The basic idea works like this:
- Each particular group has the first part of its sort key created by its own key function, which is customized depending on how the group should be sorted (so in the case of Lexical Suffixes, the first part of the key is a concatenation of the lexical suffix ids in the order they appear in the item, with underscores between them).
- The custom function then calls a generic getSecondarySortKey function, which creates the remainder of the sort key. This works by generating a series of sort key components, working through the print order of the groups. For each major group, it creates a key based on an underscore followed by a letter prefix (a, b, c etc.), then another underscore, then the xml:id of the morpheme which is in that group. The secondary key function has a parameter enabling it to ignore the primary key component, which has already been handled by the custom function.
- The result is a key on which we can sort each major group. This has been implemented so far for everything up to lexical suffixes -- in other words, it generates keys for each of the major groups which are harvested AFTER lexical suffixes (anything harvested before will not appear in the lexical suffix group anyway, because it will be in a preceding group), and each of those groups is handled in their print order.
- The underscore has been added to the MosesPhonemicCollation class at the beginning of the sort order, so it can be used to massage the sort order when necessary.
- TODO #1: complete the secondary sort key function so that it handles everything still missing (Compounds, Primary Affixes and Lexical Prefixes).
- TODO #2: Add the final three sort categories (Gi, Gii, Giii) at the end of the sequence.
- TODO #3: Write or fix up the custom sort functions for all of the other major groups.
This entry was posted by and is filed under Activity log.