<form> <pron> <seg type="phonemic">ṣə̣́nṣə̣nt</seg> </pron> </form> <sense> <def> <seg> <gloss>tame</gloss> </seg> <bibl>EP2.68.8</bibl> </def> </sense> <form> <pron> <seg type="phonemic">ṣə̣́nṣə̣nt</seg> </pron> </form> <sense> <def> <seg> it is <gloss>tame</gloss> or <gloss>gentle</gloss> </seg> <bibl>JM3.21.11</bibl>As far as the question about English glosses is concerned, I understand that the asterisks should be removed, and I will do that. But I don't entirely understand how the gloss function works. So if you look at the last example just above, the meaning is contained within two types of tags, the one referring to "segment" and the one referring to the gloss itself. I'm wondering if you could define for me exactly what the gloss tags should contain. In this case, for instance, you've got 'it is' and 'or' outside of the gloss tags. But they are also part of the meaning. However, they do not need to be targetted in an English-Nx rendition of the word-list.We would search for 'tame' or 'gentle', not 'it is tame or gentle'. I hope I am making myself clear. So if I go back to all the glosses and remove asterisks and add gloss tags, how exactly do I position the gloss tags?
This is my take on Ewa's questions in the preceding post:
1. I discovered that in some entries which have different forms the following occurs: we have several pronseg cases, each of which is the same, and associated with each pronseg we have a unique definition. The pronseg's are all phonemic so in the past we did not mark them as type="phonemic" since they did not contrast with type="narrow". Should they be marked as type="phonemic"?
If I understand this correctly, the forms are the same in all cases, but the definitions are different. In this case, I suspect that they should be different entries, shouldn't they? If they're the same entry, then don't they need only one form, and multiple definitions?
Incidentally, in our documentation from last July, it says that we should be using "broad" vs "narrow", rather than "phonemic". Did we change our minds on this? The docs actually show that "broad" is the default, so you wouldn't need to add it. Could you read through the PDF and let me know if what's described there differs from the existing markup you're working on now?
2. We have at least two different ways of dealing with marking up glosses so as to create, eventually, English to Nx lists. First, we still have lots of cases which were marked up in lexware with an asterisk. But we also have cases like the following definition "it is tame or gentle" where it has been marked up with two "gloss" tags. Do we want to have a consistent way of marking up the meanings now, or shall we leave that task until a later date, given that there is already so much work in marking up already?
My automatic conversion code should have converted any * items into gloss tags (* means nothing in the context of XML), but if you had already started work on this file when I added that feature, it wouldn't have been converted. The gloss tags need to be added, though.
Ewa is working on the s-rtr.xml file, and has noticed this:
"in a number of entries (e.g., saplil 'flour') there are several form elements, either because there is more than one phonetic transcription attested for the entry, or there is more than one source for the entry, or there is more than one gloss for the entry. The first form element in an entry like this will have a seg type="phonemic" and, if it is attested, a seg type="narrow". But should the remaining form elements have both seg types, when this means repeating the exact same seg type="phonemic" over and over again?"
I can certainly write the code so that in the absence of a phonemic element in a particular form element, it will look at preceding-sibling form elements until it finds one, and use that instead. If that's always going to be the right thing to do, it shouldn't be too hard to implement. Adding this as a task for April: check whether there is already any handling for this, and if not, implement it.
This went without a hitch. Uploaded the XML files (after a couple of hiccups till I realized I can't upload the biggest affix.xml file -- it's not finished or well-formed). Then Greg arranged to have the moses home folder moved out of tapor into home1t, and I created a Webapps directory in it, and pushed up the site materials. Worked out of the box.
Completed Feb 1: This task is essentially the same as the one described for the Mariage project here.
The hyph form is currently expressed as a narrow transcription where one is attested, but as phonemic where it's no, without any indication of which is the case. It should actually be a broken down rendering of the phonemic form. This change will need to be made throughout the markup already completed.
The existing hyphs were created automatically from the original data, where it was not specified whether the transcription was phonemic or attested-narrow, so there's no way to fix this mechanically. We'll continue rendering what's in the original data into m tags, but Ewa will look at each hyph when editing the data and, if it's narrow, will reformulate it as phonemic.
This change has not been made in any of the existing files yet. s-rtr.xml has been changed in other ways (adding of phonemic pron/seg elements), so it is the most up-to-date, but lacks any hyph updates as yet.