Lots of work this morning on clarifying what we should do with hyphs. Here's the breakdown:
- We should fix the delimiters in the source data, not in the output process. That means:
-ʔ- becomes <ʔ> +a+ becomes <a> +C₂+ becomes <C₂> +CVC+ becomes <CVC>BUT ONLY (in the last two cases) where the same root morpheme appears before and after the sequence.
- The string-replacement code I wrote yesterday to crudely accomplish this in the output should be removed, since the standard hyph output will now be correct anyway.
- The deletions mentioned in SMK's post should be carried out by pre-processing the whole hyph before the "translated hyph" is created:
- Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"), but only when they are infixes; you can now tell this context by the angle-bracket text nodes surrounding them. Note that there may be more than one infix separating the two roots (there are no instances of this right now, but there will be as more data is processed).
- Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root (again, only where it is an infix, determined by surrounding text nodes).
- The other part of SMK's post, relating to the situation where a root morpheme has no gloss, is now changed: we do not keep the other instance of the root, but instead we replace the unglossed root with a smallcapped label "Unk", signifying "unknown".
ECH also sends these instructions re changes to the indexes in the appendix, having changed the feature structures of clitics. I've implemented these:Put the List of Root Morphemes first, and maybe change the headings as I've indicated here:
Four Appendices 1. List of Root Morphemes all roots (but not stems) (i.e. anything with <f name="baseType"> <symbol value="root"/></f>) 2. List of Lexical Affixes (from lex-pref.xml and lex-suf.xml) 3. List of Grammatical Morphemes -all grammatical affixes (those in the five affix xml files) plus inflectional clitics The inflectional clitics are defined as <f name="baseType"> <symbol value="clitic"/> AND <f name="cliticType"> <symbol value="inflectional"/> 4. List of Particles -all particles (particles.xml),
It makes sense to have the List of Root Morphemes because this provides a different information than what is in the Root-based Index. The Root-based Index is a listing of all the words in the dictionary organized by root, and with morphological breakdowns; the List of Root Morphemes in the Appendix is simply a list of all the root morphemes. It is therefore a subset of the information in the Root-based Index, but in listing only morphemes it parallels the other 3 appendices which are lists of different categories of morphemes. So the Appendices will list all the individual morphemes in the dictionary.