Finished and tested the XSLT from yesterday; SMK will check results before we hard-run it and change the data.
Further to our discussions on numbers, I have added the following to feature_system.xml
1) wordType numberStem. So ECH will add this <fs> to the number stems 1-10.
2) countingType "ten"
I have also added the following <fs> to lexical suffix "akst-2", so ECH can use this morpheme for marking up the numbers 30, 40 ... 90.
MDH will then search for entries with this <fs> to build a test column for the table of numerical expressions. We can subsequently add more countingType values to the feature system, and to the entries for the appropriate lexical suffixes with classifier functions, and generate more columns for the table.
Discussions and decisions on how to handle numbers and counters: new wordType of cardinalNumeral, new lexicalSuffix type of numeralClassifier. These will be applied, and then harvesting will be done to generate a table of numerical expressions which will form the basis of decisions on how/whether to create a special section in the print dictionary.
SK pointed out that the English-Moses index was sorting Js to the end, and indeed when I looked at the collation that we're using for all sorting (MosesPhonemicCollation, which is designed to handle both English and Moses), J was omitted from the sequence. I added it to the source, installed NetBeans and recompiled the jar, and all seems to be well. I was happy to see that NetBeans was its usual trouble-free self; installed quickly, worked out of the box, and although it complained that a dependency ("hamcrest") was missing from the project, it added it for me, resolving the issue painlessly.
Also added a new Schematron rule to the set, to catch entries with no pron/seg[@type='p'], at SK's request; that caught 19 additional errors, which she's fixing.
Per SK's request, new report on entries ending with a specific sequence of chars.
Greg pointed out that we are using Ø (Latin Capital Letter O with Stroke, U+00D8) for our zero morpheme marker, rather than ∅ (Empty Set, U+2205). The latter is noted in the Unicode character map as the one used in linguistics to indicate a null morpheme or phonological zero.
We have at least been consistent in our use of the former! We may not have known the Empty Set character existed when we chose the other one in 2010, or it may have been a font-based choice. (I'm using Aboriginal Sans in Oxygen right now, and the Empty Set character doesn't display properly.)
Martin will add this change to his list of global changes to make when improving our current encoding, if we can be assured of fonts that include Empty Set along with all the other special characters we need.
Diagnosed the borkedness of a borked XML file; fixed some XSLT; tried building the dictionary only to discover that of course XEP wasn't set up in Oxygen; reconfigured all the old hard-coded paths in build tasks; built the PDF; and more tweaks. Reminder to self: the diagnostics page is erroneously including an extra include for the personography, minus its file extension; needs fixing.
Beefed up the diagnostic processing of feature structures to add stats tables, revealing that many vals are never used. Food for thought. But no new errors revealed, which is good.
In our discussion today about segments with more than one possible hyph, we also revisited how hyphs should look in print dictionary entries. We currently show the hyph followed by the "translated hyph" - e.g.:
[[x̣mánk-n-c • love-CTR-TR.1SgObj.3TrSbj]]
The final segment -c is shown to be composed of 3 morphemes, separated by periods: TR.1SgObj.3TrSbj
We are concerned that this will not be transparent to learner users of the dictionary. So we decided to update the <label> elements to include the first <pron> of the morpheme entry, where that would be helpful - e.g.
We need to think further about exactly how this should be implemented.
We should also check again how Montler 2012 represented syncretic morphemes in these "translated hyphs" in his root-based index. See photocopies in Print Dictionary Working Notes folder.
And we need to add an index of labels somewhere in the dictionary front matter!
Added a check for morphemes in completed files not pointing at existing entries. There are 222.
:: Next Page >>
This is an XML dictionary project based primarily on the materials compiled by the late M. Dale Kinkade during fifteen years of work in the 1960’s and 1970’s with more than a dozen native speakers of the language, but it also includes materials compiled by Ewa Czaykowska-Higgins in the early 1990’s.