This never ends. We are now arriving at a canonical sort order for actual morphemes within morpheme groups, so that the secondary sort key can use zero-padded numbers instead of morpheme ids (which obviously don't sort alphabetically in any ideal way). We have also added a final item to the end of the secondary sort key, which consists of the text of the hyph; this will be used to differentiate two items with identical morphemic structure. Finally, I've added many more latin glyphs to the MosesPhonemicSort collation (which is now our default "sort for all occasions" library) so that it can handle e.g. upper-case letters in morpheme ids, and also so that it ignores morpheme delimiter characters, thereby improving the final-stage part of the key mentioned above.
This entry was posted by and is filed under Activity log.