I've finally had a chance to look at your questions in detail:
1. In the inchoative entry one of the allomorphs (the glottal stop), is an infix, while the other allomorph (the -p) is a suffix. In the feature structure it is possible to two different symbol values for type of morpheme, so I used this possibility to list the inchoative morpheme as being both an infix and a suffix. ՠQuestion for Martin: But the question that I have is how can we indicate which allomorph is an infix and which is a suffix in the database? Does this pose any kind of problem for the database?
This is a question I hadn't thought about before, because it hadn't occurred to me that there would be two different morpheme types for one morpheme. However, a relatively simple solution suggests itself:
<form type="allomorph" n="1"> ... </form> <form type="allomorph" n="2"> ... </form> ... <fs> <f name="baseType"> <symbol value="infix" n="1"/> <symbol value="suffix" n="2" /> </f> ... </fs>
Then I can write code to detect the presence of the n attributes, and link the correct form to the correct symbol value. I've added this to the documentation, and I've also posted a link to the documentation on the site.
2. This has to do with markup of glosses in illustration (dictegs). ՠQuestion for Martin: Do we want the English-Nx wordlist to be able to access illustration glosses, and if so, how do we mark this up? Can we use the same system of segs and glosses?
My original intention was that the gloss tag would be used inside a <def><seg> tag to signal a word or phrase which could be used to create the English-Nx wordlist, and that the wordlist would be constructed only based on <gloss> tags occurring in that context. In the <dicteg> tags, we're using <gloss> for something else:
If a gloss for the illustration is required, it can be included in the <quote> tag with a <gloss> tag, like this:<cit> <quote>The quoted illustration<gloss>Translation of the illustration</gloss></quote> ... </cit>(from our guidelines)
Therefore it seems to me that using <gloss> in a different way inside the <dicteg> will be confusing. I took a look at <code>s-rtr.xml</code>, and I found some bits that look like this:
<cit><!--check stress on this one--> <quote> <phr type="phonemic">ni?c'ikus ??p?iԿ?</phr> <phr type="narrow">ne?c?ikos ??p?lꬼ/phr> <seg>whole wheat <gloss>flour</gloss></seg> </quote> <bibl>Y41.7</bibl> </cit>
This doesn't look anything like the guidelines, so I'm wondering what happened here. Was this based on the code already in the file, or did you construct this format with <phr> and <seg> tags?
It seems to me that if there's a word or phrase that can serve as a direct English equivalent to the headword appearing in an illustration, it might as well be in the <def> element, wrapped in a <gloss> tag; is there any good reason to take material from the illustrations for the English-Nx glossary?
3. Crossreferences: Here are the two different ways of doing cross-references. They are from the same entry -?-/-p ԩnchoativeԮ Note that the format in (a) does not provide a gloss for the cross-reference. Presumably this is because the gloss is meant to be determined by looking at the entry of the word that is referred to in the cross-reference. The effect of xr is to point to the xml:id and thus the entry for the referred to word. The format in (b) does provide a gloss, but does not point to the entry of the referred to word. I assume that (a) is actually the format that we want to be following but your input is needed here Martin. (a) <dicteg> <cit> <quote>s-vt?a+?+x-mgloss>it is getting sweet</gloss> </quote> <bibl><!--[No source]--></bibl> </cit> <xr>See<ref target="t??x">t??ո</ref></xr> </dicteg> (b) <dicteg> <cit> <quote>vk??մ?-p<gloss>rope breaks</gloss> <note>cf. k?k'?մ'?n ' break a line'</note> </quote> <bibl><!--[No source]--></bibl> </cit> </dicteg>
Our documentation shows this example:
<xr>See <ref target="idblah">Blah</ref> (English blah) and <ref target="idblah2">blah2</ref> (English blah2).</xr>
The intention is that the gloss, if needed, be simply in brackets. I think the structure quoted in your question is the result of the automatic conversion code doing the best it could with the source material; in this case, there was no gloss for the cross reference encoded with <xr>, and the second cross-reference was simply not encoded properly in the original source. If there's a difference between a link introduced by "See" and one introduced by "cf.", then we'll need to elaborate the tagging system a bit, but I suspect in this case the second cross-reference should be re-encoded using an <xr> tag.