Nxaʔamxcín (Moses) Dictionary Blog

March 16, 2007

pron segs continued

Posted by on 16 Mar 2007 in Activity log

From looking at your answer to my first question, Martin, I think I may not have explained myself very well.So I'll try again. Both of my questions are exemplified in the following sample from the s-rtr.xml file. Note that both the forms have similar pronunciation types--in other words the types are both phonemic, identical to each other, and identical to the first phonemic example of the entry word and of the shape that that word takes in the hyphenated element. Although each one of the forms has different glosses and sources, they are both part of the same entry because they are just variants of the same word. The question is, in this kind of case, should we specify the pronunciation segment type as "phonemic". Does this confuse anything? Is it necessary? How will it affect the search function?

              <form>
                  <pron>
                     <seg type="phonemic">ṣə̣́nṣə̣nt</seg>
                  </pron>
               </form>
               <sense>
                  <def>
                     <seg>
                        <gloss>tame</gloss>
                     </seg>
                     <bibl>EP2.68.8</bibl>
                  </def>
               </sense>
               <form>
                  <pron>
                     <seg type="phonemic">ṣə̣́nṣə̣nt</seg>
                  </pron>
               </form>
               <sense>
                  <def>
                     <seg> it is
                        <gloss>tame</gloss> or <gloss>gentle</gloss>
                     </seg>
                     <bibl>JM3.21.11</bibl>

As far as the question about English glosses is concerned, I understand that the asterisks should be removed, and I will do that. But I don't entirely understand how the gloss function works. So if you look at the last example just above, the meaning is contained within two types of tags, the one referring to "segment" and the one referring to the gloss itself. I'm wondering if you could define for me exactly what the gloss tags should contain. In this case, for instance, you've got 'it is' and 'or' outside of the gloss tags. But they are also part of the meaning. However, they do not need to be targetted in an English-Nx rendition of the word-list.We would search for 'tame' or 'gentle', not 'it is tame or gentle'. I hope I am making myself clear. So if I go back to all the glosses and remove asterisks and add gloss tags, how exactly do I position the gloss tags?

March 14, 2007

Responses to 2 issues

Posted by on 14 Mar 2007 in Activity log

This is my take on Ewa's questions in the preceding post:

1. I discovered that in some entries which have different forms the following occurs: we have several pronseg cases, each of which is the same, and associated with each pronseg we have a unique definition. The pronseg's are all phonemic so in the past we did not mark them as type="phonemic" since they did not contrast with type="narrow". Should they be marked as type="phonemic"?

If I understand this correctly, the forms are the same in all cases, but the definitions are different. In this case, I suspect that they should be different entries, shouldn't they? If they're the same entry, then don't they need only one form, and multiple definitions?

Incidentally, in our documentation from last July, it says that we should be using "broad" vs "narrow", rather than "phonemic". Did we change our minds on this? The docs actually show that "broad" is the default, so you wouldn't need to add it. Could you read through the PDF and let me know if what's described there differs from the existing markup you're working on now?

2. We have at least two different ways of dealing with marking up glosses so as to create, eventually, English to Nx lists. First, we still have lots of cases which were marked up in lexware with an asterisk. But we also have cases like the following definition "it is tame or gentle" where it has been marked up with two "gloss" tags. Do we want to have a consistent way of marking up the meanings now, or shall we leave that task until a later date, given that there is already so much work in marking up already?

My automatic conversion code should have converted any * items into gloss tags (* means nothing in the context of XML), but if you had already started work on this file when I added that feature, it wouldn't have been converted. The gloss tags need to be added, though.

A couple of points that came up today

Posted by on 14 Mar 2007 in Activity log

1. I discovered that in some entries which have different forms the following occurs: we have several pronseg cases, each of which is the same, and associated with each pronseg we have a unique definition. The pronseg's are all phonemic so in the past we did not mark them as type="phonemic" since they did not contrast with type="narrow". Should they be marked as type="phonemic"? 2. We have at least two different ways of dealing with marking up glosses so as to create, eventually, English to Nx lists. First, we still have lots of cases which were marked up in lexware with an asterisk. But we also have cases like the following definition "it is tame or gentle" where it has been marked up with two "gloss" tags. Do we want to have a consistent way of marking up the meanings now, or shall we leave that task until a later date, given that there is already so much work in marking up already?

Discussion continued

Posted by on 14 Mar 2007 in Activity log

I think that it is a good idea to write the code so that in the absence of a phonemic element in a particular form element, it will look at preceding-sibling form elements until it finds a preceding one. So it would be great if you could do this Martin.

March 13, 2007

Discussion on the handling of redundant seg elements

Posted by on 13 Mar 2007 in Activity log, Tasks

Ewa is working on the s-rtr.xml file, and has noticed this:

"in a number of entries (e.g., saplil 'flour') there are several form elements, either because there is more than one phonetic transcription attested for the entry, or there is more than one source for the entry, or there is more than one gloss for the entry. The first form element in an entry like this will have a seg type="phonemic" and, if it is attested, a seg type="narrow". But should the remaining form elements have both seg types, when this means repeating the exact same seg type="phonemic" over and over again?"

I can certainly write the code so that in the absence of a phonemic element in a particular form element, it will look at preceding-sibling form elements until it finds one, and use that instead. If that's always going to be the right thing to do, it shouldn't be too hard to implement. Adding this as a task for April: check whether there is already any handling for this, and if not, implement it.

February 1, 2007

Ported Moses to the new Cocoon/eXist

Posted by on 01 Feb 2007 in Activity log

This went without a hitch. Uploaded the XML files (after a couple of hiccups till I realized I can't upload the biggest affix.xml file -- it's not finished or well-formed). Then Greg arranged to have the moses home folder moved out of tapor into home1t, and I created a Webapps directory in it, and pushed up the site materials. Worked out of the box.

January 31, 2007

COMPLETED TASK: Move Moses project to new Cocoon/eXist

Posted by on 31 Jan 2007 in Tasks

Completed Feb 1: This task is essentially the same as the one described for the Mariage project here.

December 18, 2006

Convert all phonetic hyph elements to phonemic in s-rtr.xml

Posted by on 18 Dec 2006 in Tasks

This update will render s-rtr.xml a complete working file (the only one so far).

Further required change: hyph

Posted by on 18 Dec 2006 in Activity log

The hyph form is currently expressed as a narrow transcription where one is attested, but as phonemic where it's no, without any indication of which is the case. It should actually be a broken down rendering of the phonemic form. This change will need to be made throughout the markup already completed.

The existing hyphs were created automatically from the original data, where it was not specified whether the transcription was phonemic or attested-narrow, so there's no way to fix this mechanically. We'll continue rendering what's in the original data into m tags, but Ewa will look at each hyph when editing the data and, if it's narrow, will reformulate it as phonemic.

This change has not been made in any of the existing files yet. s-rtr.xml has been changed in other ways (adding of phonemic pron/seg elements), so it is the most up-to-date, but lacks any hyph updates as yet.

Change to displayed form

Posted by on 18 Dec 2006 in Activity log

The list of entries is now displayed with the headword taken from the form[1]/pron[1]/seg[1] form rather than the hyph. This makes for a cleaner display, because this is (will be) the phonemic form. Currently, however, the affixes in the db don't have phonemic forms, so they're displaying with whatever constitutes the first seg in their form[1]/pron[1] element.

Nxaʔamxcín (Moses) Dictionary Blog

This is an XML dictionary project based primarily on the materials compiled by the late M. Dale Kinkade during fifteen years of work in the 1960’s and 1970’s with more than a dozen native speakers of the language, but it also includes materials compiled by Ewa Czaykowska-Higgins in the early 1990’s.

Search

XML Feeds

RSS 2.0: Posts
Atom: Posts

What is RSS?

Sidebar 2

This is the "Sidebar 2" container. You can place any widget you like in here. In the evo toolbar at the top of this page, select "Customize", then "Blog Widgets".