I've started mapping out XML structures for the properties and all their contents, and sent some queries to JS-R. Looks like a TEI extension with a lot of specialist elements and attributes is what we need, and I've started an ODD file.
Lots of work this morning on clarifying what we should do with hyphs. Here's the breakdown:
-ʔ- becomes <ʔ> +a+ becomes <a> +C₂+ becomes <C₂> +CVC+ becomes <CVC>BUT ONLY (in the last two cases) where the same root morpheme appears before and after the sequence.
ECH also sends these instructions re changes to the indexes in the appendix, having changed the feature structures of clitics. I've implemented these:Put the List of Root Morphemes first, and maybe change the headings as I've indicated here:
Four Appendices 1. List of Root Morphemes all roots (but not stems) (i.e. anything with <f name="baseType"> <symbol value="root"/></f>) 2. List of Lexical Affixes (from lex-pref.xml and lex-suf.xml) 3. List of Grammatical Morphemes -all grammatical affixes (those in the five affix xml files) plus inflectional clitics The inflectional clitics are defined as <f name="baseType"> <symbol value="clitic"/> AND <f name="cliticType"> <symbol value="inflectional"/> 4. List of Particles -all particles (particles.xml),
It makes sense to have the List of Root Morphemes because this provides a different information than what is in the Root-based Index. The Root-based Index is a listing of all the words in the dictionary organized by root, and with morphological breakdowns; the List of Root Morphemes in the Appendix is simply a list of all the root morphemes. It is therefore a subset of the information in the Root-based Index, but in listing only morphemes it parallels the other 3 appendices which are lists of different categories of morphemes. So the Appendices will list all the individual morphemes in the dictionary.
Posting time spent on the Pro-D expenses claim for my TESOL conference attendance.
Preparation for editorial meeting next week.
It seems that video conferencing will be done through BlueJeans, so SA and I tested this out; there's a simple deb for Linux, and it works fine in Firefox, with around a half-second delay between our desks. UVic supports it.
I'm working on SMK's instructions for hyphs here. I've implemented the first part, which is easy: it's just a search-and-replace on strings. But I'm struggling with the second part, mainly because I don't understand the examples properly. My questions are below; waiting for clarification from ECH.
[INSTRUCTIONS] -- when generating the translated hyph, a) Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"): For example: [[√ʔiɬ<CVC>n-úl • √eat<char>-attrib]] BUT, if the root has no gloss, DO keep the second part of the root: For example: [[k-√cúwˀ<CVC>x=ánaʔ • loc-√cúwˀ<char>x=ear]] b) Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root. For example: [[√p<a>tix̣ʷ • <rep>√test]] Again, if the root has no gloss, keep the first part of the root. For example: [[√p<a>tix̣ʷ • √p<rep>tix̣ʷ]] [/INSTRUCTIONS]
The first example comes from this (I'll pretty-print the hyph for clarity):
<hyph> √ <m corresp="m:ʔiɬn">ʔiɬ</m> + <m corresp="m:CHAR">CVC</m> + <m corresp="m:ʔiɬn">n</m> - <m corresp="m:ul">úl</m> </hyph>
Question 1: Can I ignore the intervening characters between the <m> elements for the purposes of detecting infixes? For instance, can I search for a sequence of:
<m>rootX</m> <m>CHAR</m> <m>rootX</m>
and be sure it's OK to delete the second root, regardless of what text nodes happen to intervene? Or might there be instances of, for instance,<m>rootX</m>-<m>CHAR</m>-<m>rootX</m>
where instead of + characters, there are hyphens, and the relationship is now entirely different so the deletion should not be triggered?
Question 2: I'm a bit confused about the idea of retaining the second root if it has no gloss. Why? The example comes from this hyph:
<hyph> <m corresp="m:k-LOC">k</m> -√ <m corresp="m:cuwx">cúwˀ</m> + <m corresp="m:CHAR">CVC</m> + <m corresp="m:cuwx">x</m> = <m corresp="m:anaʔ">ánaʔ</m> </hyph>
and the entry xml:id="cuwx" is indeed lacking a gloss (it's an inferred entry). But if we delete reduplicated roots in most cases, but not in this one, aren't people going to assume that the second instance of the morpheme, which shows up as "x", is something else entirely, because they will assume that a second instance has already been deleted, as it would be in most normal cases? Are we expecting people to distinguish between a case where a root disappears because it has a gloss, and one where it doesn't disappear because it doesn't have a gloss? That seems extremely confusing to me. I would naturally assume that if reduplicated roots are normally deleted, that's the case here too, and the "x" is a subsequent and completely different morpheme (especially since it bears no resemblance to the first instance, "cúwˀ").
Trying to debug page-rendering issue in XSL:FO for Moses project.
We have excludes now tested and working, and I've created some more useful sets of update scripts.
Also, GN hacked our fonts to add subscript 1 and 2, since we need these, and the font author has not responded to our requests.
|<< <||> >>|