I'm working on SMK's instructions for hyphs here. I've implemented the first part, which is easy: it's just a search-and-replace on strings. But I'm struggling with the second part, mainly because I don't understand the examples properly. My questions are below; waiting for clarification from ECH.
[INSTRUCTIONS]
-- when generating the translated hyph,
a) Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"):
For example: [[√ʔiɬ<CVC>n-úl • √eat<char>-attrib]]
BUT, if the root has no gloss, DO keep the second part of the root:
For example: [[k-√cúwˀ<CVC>x=ánaʔ • loc-√cúwˀ<char>x=ear]]
b) Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root.
For example: [[√p<a>tix̣ʷ • <rep>√test]]
Again, if the root has no gloss, keep the first part of the root.
For example: [[√p<a>tix̣ʷ • √p<rep>tix̣ʷ]]
[/INSTRUCTIONS]
The first example comes from this (I'll pretty-print the hyph for clarity):
<hyph>
√
<m corresp="m:ʔiɬn">ʔiɬ</m>
+
<m corresp="m:CHAR">CVC</m>
+
<m corresp="m:ʔiɬn">n</m>
-
<m corresp="m:ul">úl</m>
</hyph>
Question 1: Can I ignore the intervening characters between the <m> elements for the purposes of detecting infixes? For instance, can I search for a sequence of:
<m>rootX</m> <m>CHAR</m> <m>rootX</m>
and be sure it's OK to delete the second root, regardless of what text nodes happen to intervene? Or might there be instances of, for instance,
<m>rootX</m>-<m>CHAR</m>-<m>rootX</m>
where instead of + characters, there are hyphens, and the relationship is now entirely different so the deletion should not be triggered?
Question 2: I'm a bit confused about the idea of retaining the second root if it has no gloss. Why? The example comes from this hyph:
<hyph>
<m corresp="m:k-LOC">k</m>
-√
<m corresp="m:cuwx">cúwˀ</m>
+
<m corresp="m:CHAR">CVC</m>
+
<m corresp="m:cuwx">x</m>
=
<m corresp="m:anaʔ">ánaʔ</m>
</hyph>
and the entry xml:id="cuwx" is indeed lacking a gloss (it's an inferred entry). But if we delete reduplicated roots in most cases, but not in this one, aren't people going to assume that the second instance of the morpheme, which shows up as "x", is something else entirely, because they will assume that a second instance has already been deleted, as it would be in most normal cases? Are we expecting people to distinguish between a case where a root disappears because it has a gloss, and one where it doesn't disappear because it doesn't have a gloss? That seems extremely confusing to me. I would naturally assume that if reduplicated roots are normally deleted, that's the case here too, and the "x" is a subsequent and completely different morpheme (especially since it bears no resemblance to the first instance, "cúwˀ").