Responses on the English-NX Wordlist issue
I'm still looking in detail at your postings below, and I'm not sure I've grasped the issue fully yet, but I think it would help if I explain how I envisage the English-NX wordlist system working (in fact, the only way I can envisage it working at the moment).
The intention, as I understand it, is to produce a wordlist, not a dictionary. In other words, the output will be a list of English words and phrases in alphabetical order, each with an equivalent NX word or phrase. The way this would be achieved is this:
- Find each
<gloss>tag which is intended to be a wordlist entry. (This means that we have to disambiguate<gloss>tags which are intended to be for the wordlist from those which aren't; that can only be done on the basis of their context, or failing that, because they have a particular attribute added to them which distinguishes them. - For each such gloss tag, find the nearest appropriate NX word or phrase in the tree which is equivalent to it. (I had understood this to mean going up the tree to the
<entry>level, then taking the first<seg>in the first<pron>in the first<form>element in the entry.
This obviously requires that any <gloss> tag we're going to use for this purpose contain an English word or phrase that IS equivalent to the <seg> element as described above. If it's not going to be equivalent, then the question arises "what is it a gloss for?"
Do you envisage the process in the same way I do? If not, how had you imagined it?