Answers to questions below
Some answers to your questions:
1. Question: There are several cases of two or more different affixes having almost identical meanings. This means that they have identical feature structures. Is this going to be a problem? For example, xix and xax are both baseType, suffix, morphoSyntactic, indefinite-object.
I don't see why this would be a problem. We identify them by their xml:id
, and display/sort them by their first <pron>
, so I don't see any conflict.
2. Question about prons and hyphs of reduplicative morphemes: How should prons and hyphs of reduplications be represented? Reduplicative morphemes have changeable form, depending on what the shape of the base of the reduplication is. For example, if the root is of the shape xit, the reduplicative suffix “characteristic” will be xit (xit-xit), but if the root is of the shape quc, the reduplicative suffix “characteristic” will be quc (quc-quc). The basic shape of the reduplication is thus CVC (consonant-vowel-consonant), but what the exact segmental content of the suffix is depends on the segments found in the root. The simplest thing for a pron would be to specify the CV-shape of each reduplication. For example, the pron for the reduplicative suffix whose meaning is “characteristic” would be CVC, for the distributive it would be CEC (where E=schwa), for repetitive it would be Ca, and for out of control it would be VC, and for diminutive it would be C-. For the hyph forms, it would be the same type of thing. For example, for characteristic the hyph would thus include sameAs=”CHAR”>CVC Is it possible/desirable to do this in an xml markup?
As long as the xml:id
attributes are distinct, I don't think it matters. If each has a unique CV-shape, then that would be a good way to characterize them, given that they have no default or normalized representation at all.
3. I have completed to the end of hard copy affix10 of the affix files, except for fixing cross-references in the last entry, which is the DIM form. There is one more of these files left in the affix set.
I'm not sure what this means. On the server, I can only see one affix.xml
file. Is "affix10" above a typo, or is there such a file somewhere?
1. combining glottal and combining comma. By combining comma do you mean the combining apostrophe? If so, then combining glottal and combining comma/apostrophe are two different symbols for the same sound: they both represent glottalization. We decided at some point in December I think that we will actually replace all the combining glottals with combining commas/apostrophes in order to be completely consistent throughout. The one constraint on this is that we have to continue to use combining glottals in the xml:ids.
As we've said before, Greg and I both think replacing the glottals with commas is a bad idea, because it amounts to misrepresenting the data. It's also rather pointless, because for any particular context in which we're displaying this data, we can do a translation from glottal to comma on the fly; there's no need to store misleading data just so we can see it on the page. The combining comma I'm talking about, though, is one which appears above the w and y characters in the handwritten alphabetical order I've been working from. That character is "U+0313 : COMBINING COMMA ABOVE", whereas the combining glottal is "U+02C0 : MODIFIER LETTER GLOTTAL STOP". The former shows up above the modified letter, the latter shows up to the right of it. If I understand you correctly, these are intended to represent the same sound -- a glottal -- but you want the modifier to appear above the letter when the letter is w or y, and on the right of it when it's any other letter. This is problematic because if you convert them all to combining comma above, they'll appear above the letter everywhere; so using commas won't even solve the display problem it's intended to solve.
My recommendation is to keep your data correct and pure, and use the right character throughout (the glottal). Then for display purposes we write display code that does a translation in some circumstances (e.g. it substitutes a comma above for w and y, and adds an apostrophe after for other letters, if that's what you want). If the data itself is corrupted by display preferences, then it's going to be less useful for research and display in the future, in other contexts. That's my opinion, anyway.
2. Acute and grave accents are irrelevant for alphabetical order. In other words there is no difference in alphabetical order between [a] with no accent, [a] with acute accent, and [a] with grave accent; and this is similar for all the other vowels. Does this mean that the java sorter can ignore the accents?
I'll have to go away and think about that. We've actually got the java class sorting successfully, but right now, it needs a position for every character in the alphabet (which includes accents); I'll have to add some code to strip out the accents before comparing words. I think it should be fairly straightforward.
3. What I can’t determine from your presentation of the material is whether there is significance to the order that you have given for the diacritics. Why have you placed [dot below] before [combining glottal], etc. ? Can you explain this to me?
This is based on your own handwritten list, in which c-with-dot-below appears before c-with-glottal (and the same for all other combinations of these diacritics with other characters). If c-with-dot-below comes before c-with-glottal, then dot-below comes before glottal.