Met with ID and TM, visiting from Texas, and had a good long discussion about dictionary production, layout, publication and related topics. Very helpful.
Completed all the tasks in the previous post. Surprised to discover that background watermarking was not very difficult. Created an SVG file for the watermark, and a temporary xsl:attribute-set which sets it as a background-image, then assigned the temporary attribute-set to all the fo:region-body elements in the output. Done.
Arising out of today's meeting:
- When rendering appendix indexes, place the label after the full list of allomorphs, not after the first one. This currently applies only to the Grammatical Morphemes, but will apply later to Lexical Affixes (see below).
- All entries in lex-suf and lex-pref should have a
<label>
(last element in first<def>
containing "la". - When rendering a list of allomorphs in the indexes, use only a tilde, not a comma + tilde, to separate them.
- Create another appendix index of all the placeName entries (where the
<form>
/<pron>
/<seg>
contains a<placeName>
element). There are currently only two in the completed files, but 264 across all the files. - Investigate whether a watermark for DRAFT ONLY can be added in XEP, so that we don't have to worry so much about the PDF being shared before it's finished.
Per decisions in the previous post:
- Appendix index entries now include all allomorphs.
- Where allomorphs have different feature structures (through
<vAlt>
elements in the<fs>
), the prefixes and suffixes for affixation are now sensitive to those differences and supply the correct versions for each allomorph (this needs rigorous confirmation by ECH).
I've also done a considerable amount of cleanup of the rendering of all indexes, especially the root-based index, which had headwords hanging over into the page margin.
One outstanding question: the root-based index headwords do not include allomorphs right now. I think they probably shouldn't (it's cleaner and clearer without, and in any case you would most likely get to them from the main entries), but if we decide otherwise, all that needs to happen is that code from the fo_extra_indexes.xsl/outputExtraIndexEntry
template would need to be imported into the fo_root_based_index.xsl/outputEntry
template.
Currently, allomorphs are not included in the appendix indexes, and they should be (listed immediately after the first). This will require changes to the outputEntry
template. However, in addition, the hcmc:getAffixPrefix()
, hcmc:getAffixSuffix()
and hcmc:getAffixDelimiter()
functions will also need to be updated so that they are not as crude as currently. Right now, they read the first descendant::symbol
element, but in fact they'll have to be aware of which allomorph is the subject of the operation, and choose the correct symbol where there is a <vAlt>
element.
Based on feedback from ECH, I've made a number of changes to the rendering of the indexes in the appendix. In the process of discussing this yesterday, we noticed there are many oddities in the placement of gloss tags and related spaces, so I did some regex work to pull up a few hundred candidate issues and fixed the ones that needed doing.
Lots of work this morning on clarifying what we should do with hyphs. Here's the breakdown:
- We should fix the delimiters in the source data, not in the output process. That means:
-ʔ- becomes <ʔ> +a+ becomes <a> +C₂+ becomes <C₂> +CVC+ becomes <CVC>
BUT ONLY (in the last two cases) where the same root morpheme appears before and after the sequence. - The string-replacement code I wrote yesterday to crudely accomplish this in the output should be removed, since the standard hyph output will now be correct anyway.
- The deletions mentioned in SMK's post should be carried out by pre-processing the whole hyph before the "translated hyph" is created:
- Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"), but only when they are infixes; you can now tell this context by the angle-bracket text nodes surrounding them. Note that there may be more than one infix separating the two roots (there are no instances of this right now, but there will be as more data is processed).
- Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root (again, only where it is an infix, determined by surrounding text nodes).
- The other part of SMK's post, relating to the situation where a root morpheme has no gloss, is now changed: we do not keep the other instance of the root, but instead we replace the unglossed root with a smallcapped label "Unk", signifying "unknown".
ECH also sends these instructions re changes to the indexes in the appendix, having changed the feature structures of clitics. I've implemented these:
Put the List of Root Morphemes first, and maybe change the headings as I've indicated here:Four Appendices 1. List of Root Morphemes all roots (but not stems) (i.e. anything with <f name="baseType"> <symbol value="root"/></f>) 2. List of Lexical Affixes (from lex-pref.xml and lex-suf.xml) 3. List of Grammatical Morphemes -all grammatical affixes (those in the five affix xml files) plus inflectional clitics The inflectional clitics are defined as <f name="baseType"> <symbol value="clitic"/> AND <f name="cliticType"> <symbol value="inflectional"/> 4. List of Particles -all particles (particles.xml),
It makes sense to have the List of Root Morphemes because this provides a different information than what is in the Root-based Index. The Root-based Index is a listing of all the words in the dictionary organized by root, and with morphological breakdowns; the List of Root Morphemes in the Appendix is simply a list of all the root morphemes. It is therefore a subset of the information in the Root-based Index, but in listing only morphemes it parallels the other 3 appendices which are lists of different categories of morphemes. So the Appendices will list all the individual morphemes in the dictionary.
I'm working on SMK's instructions for hyphs here. I've implemented the first part, which is easy: it's just a search-and-replace on strings. But I'm struggling with the second part, mainly because I don't understand the examples properly. My questions are below; waiting for clarification from ECH.
[INSTRUCTIONS] -- when generating the translated hyph, a) Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"): For example: [[√ʔiɬ<CVC>n-úl • √eat<char>-attrib]] BUT, if the root has no gloss, DO keep the second part of the root: For example: [[k-√cúwˀ<CVC>x=ánaʔ • loc-√cúwˀ<char>x=ear]] b) Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root. For example: [[√p<a>tix̣ʷ • <rep>√test]] Again, if the root has no gloss, keep the first part of the root. For example: [[√p<a>tix̣ʷ • √p<rep>tix̣ʷ]] [/INSTRUCTIONS]
The first example comes from this (I'll pretty-print the hyph for clarity):
<hyph> √ <m corresp="m:ʔiɬn">ʔiɬ</m> + <m corresp="m:CHAR">CVC</m> + <m corresp="m:ʔiɬn">n</m> - <m corresp="m:ul">úl</m> </hyph>
Question 1: Can I ignore the intervening characters between the <m> elements for the purposes of detecting infixes? For instance, can I search for a sequence of:
<m>rootX</m> <m>CHAR</m> <m>rootX</m>
and be sure it's OK to delete the second root, regardless of what text nodes happen to intervene? Or might there be instances of, for instance,
<m>rootX</m>-<m>CHAR</m>-<m>rootX</m>where instead of + characters, there are hyphens, and the relationship is now entirely different so the deletion should not be triggered?
Question 2: I'm a bit confused about the idea of retaining the second root if it has no gloss. Why? The example comes from this hyph:
<hyph> <m corresp="m:k-LOC">k</m> -√ <m corresp="m:cuwx">cúwˀ</m> + <m corresp="m:CHAR">CVC</m> + <m corresp="m:cuwx">x</m> = <m corresp="m:anaʔ">ánaʔ</m> </hyph>
and the entry xml:id="cuwx" is indeed lacking a gloss (it's an inferred entry). But if we delete reduplicated roots in most cases, but not in this one, aren't people going to assume that the second instance of the morpheme, which shows up as "x", is something else entirely, because they will assume that a second instance has already been deleted, as it would be in most normal cases? Are we expecting people to distinguish between a case where a root disappears because it has a gloss, and one where it doesn't disappear because it doesn't have a gloss? That seems extremely confusing to me. I would naturally assume that if reduplicated roots are normally deleted, that's the case here too, and the "x" is a subsequent and completely different morpheme (especially since it bears no resemblance to the first instance, "cúwˀ").
Today:
- Fixed naming of particle index.
- Split out lexical affix index, particle index and root index into separate page-sequences so they can have appropriate running headers.
- Fixed some display spacing issues with translated hyphs (compensating for superfluous spaces in data).
- Fixed a bug in xsl:key to look up glosses, so glosses are now appearing for lexical items that have them in translated hyphs.
- Fixed a problem with page-masters for front matter and appendices (page-masters were not properly configured for recto and verso).
- Fixed a blank-page bug (referenced master was not there, so page was unselectable in PDF output).
- Began work on handling of various infixes (this will be very complicated).
Also, GN hacked our fonts to add subscript 1 and 2, since we need these, and the font author has not responded to our requests.
... the clitic index is actually a particle index. Relabelled and renamed variables accordingly. More to come on this...