Meeting about glosses and wordlists
Posted by mholmes on 18 Oct 2010 in Activity log
Productive meeting this morning, giving rise to the following ideas:
- The "entries" and "wordlist" views should in fact be merged, such that the entries are presented initially in the form that the wordlist now takes (with the headword followed by all its glosses), expanding to a full entry when clicked on.
- Duplicate glosses will be removed; there's no reason for the same word or phrase to be wrapped in a
<gloss>tag more than once in the same entry. I'll see about generating a list of duplicates using XQuery to help in detecting them. - In the "entries" view (Moses), affixes should not display any glosses, because where
<gloss>tags are contained inside them, they're actually glossing a word containing the affix rather than the affix itself. Instead, the feature structure information explaining the function of the affix should be displayed, in a manner which makes it clearly distinct from regular glosses, and the affix head form itself should be distinguished visually so that it's clear that it's not a regular word. - The English-Moses wordlist is to be replaced by a serious attempt to produce something along the lines of a regular English-Moses dictionary, generated automatically from the DB. This will most possibly be as unsatisfactory as previous projects which used Lexware to generate the same sort of dictionary, but we may be able to do a more sophisticated job than that. Ultimately, when the initial phase of the work on the core code is complete, we may add some markup to
<dicteg>s which enables us to generate a more complete set of English headwords by harvesting English words and Moses equivalents from<dicteg>s. (The example here is "flannel", inside the current entry for "cloth"; the Moses equivalent is a phrase containing "cloth", so it doesn't have its own entry in Moses, but in English it probably should.) - DONE: The Moses-to-English basic view should not use words from gloss tags; instead, it should use the content of the
<def>/<seg>elements as the "gloss" of the word. - DONE: Unattested glosses should never be displayed at all in the Moses-English view. They're only to be used when constructing the English-Moses view.