on finding compounds in the root-based index
I spent a long time this afternoon puzzling over the root-based index in the latest build, trying to answer Martin's question from Feb. 2: Is the revised compound-finding rule working correctly?
I think it is, although a lot of our test cases can't be tested fully until other files are completed and added to the build.
The current compound-finding rule looks for:
"all entries with the root or stem in question whose hyphs ALSO contain another <m corresp=m:""> pointing to an entry with a root or stem feature structure"
The new rule is actually finding fewer compounds right now, because it requires the entries for both roots (and/or stems) to be in files included in the build.
Our test case "wəswisxnascʼəlcʼəl", with stem "wisxn" and root "cʼəl" is currently NOT getting found as a compound, because c-glot.xml is unedited, so the root entry for "cʼəl" is not in the build. "wəswisxnascʼəlcʼəl" doesn't get caught by the compound-finder, so it passes through to the next rule and gets caught as a reduplication, and appears under Reduplications under the stem "wisxn". This is wrong, but it's temporary. When c-glot is edited and added to the build, "wəswisxnascʼəlcʼəl" will sort correctly as a compound under both "wisxn" and cʼəl"
Meanwhile, many compounds with root √x̣əƛʼ and other roots which are in the build are being found correctly and organized under both roots.
Another good test case to look at when EJD finishes s.xml and we add it to the build is "siʔsiʔtax̣x̣ƛʼcinʔ". This is not currently being found as a compound, but it should be once s.xml is added to the build.
A couple of the other compounds with √x̣əƛʼ were not found correctly due to the underdot under the x̣ floating under the schwa - likely copy-paste errors made when completing hyphs. I have now fixed these, so they should also be found correctly in the next build.
For the record, they are:
"swahamaɬx̣x̣ƛʼcintn" Ribbon Cliff
"sqəlˀtmx̣Wax̣ƛʼcin" gelding
For further test cases, see SMK's notebook, 22Mar16.