<?xml version="1.0" encoding="utf-8"?><!-- generator="b2evolution/1.9.3" -->
<rss version="0.92">
	<channel>
		<title>Nxa&#660;amxcín (Moses) Dictionary Blog</title>
					  <link>http://hcmc.uvic.ca/blogs/index.php?blog=10</link>
			  <description>Moses dictionary</description>
			  <language>en-CA</language>
			  <docs>http://backend.userland.com/rss092</docs>
			  			  <item>
			    <title>More on inferred glosses</title>
			    <description> (Mins: 20) &lt;p&gt;I am posting this exchange about inferred glosses so that I don't have to think it through all over again in the future!&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;br /&gt;
SMK wrote:&lt;/p&gt;

&lt;p&gt;Regarding the search engine, I blogged on 12/12/12:&lt;/p&gt;

&lt;p&gt;&quot;ECH's goal for the search engine in the web database is that, if a user searches for &quot;fat&quot;, s/he will get results including fat, fatten, fattening, fatty.  Our current settings, and our policies for adding inferred glosses, seem to be accomplishing this nicely. An entry which has &quot;fatty&quot; in its def is found by a search for &quot;fat&quot;, because it also has an inferred gloss &quot;fat&quot;.  Searching for &quot;fat*&quot; also returns defs including fat, fatten, fattening, fatty ... but also fatal, fathom, father.&quot;&lt;/p&gt;

&lt;p&gt;However, we also noticed the converse on 16/04/13:&lt;/p&gt;

&lt;p&gt;When I searched for the inflected form “fired”, I also I got all the entries with “fire”.  &lt;/p&gt;

&lt;p&gt;BUT when I search for “fatty” or “fatten”, I don’t get all the entries with “fat”.  What is the difference here?&lt;/p&gt;


&lt;p&gt;&lt;br /&gt;&lt;br /&gt;
MDH replied:&lt;/p&gt;

&lt;p&gt;I think you're just discovering that a stemming analyzer is not an educated human. It doesn't understand semantics; it just knows how to strip off (some) inflectional endings and index the resulting stems, and then how to stem the search input and search the stemmed index with it. You will never find an automated search engine that gives you perfect results.&lt;/p&gt;

&lt;p&gt;Right now, the search is paying no attention to whether things are in gloss tags or not; as I understand it, the purpose of the gloss tags is to construct and English-Nxa’amxcin list, not to aid in searching.&lt;/p&gt;

&lt;p&gt;The situation with &quot;fatty&quot; is definitely a bit odd; it appears that if you search for that word, you it doesn't get stemmed prior to the search, whereas if you search for &quot;fired&quot; it does. Perhaps the stemmer avoids stemming -tty inputs because there are many which shouldn't be stemmed? (&quot;batty&quot;, &quot;natty&quot;, &quot;patty&quot;, for instance.)&lt;/p&gt;


&lt;p&gt;&lt;br /&gt;&lt;br /&gt;
SMK continued:&lt;/p&gt;

&lt;p&gt;OK, so when I search for fatten, fattened, or fattening, I get the same 5 hits – 3 for “fattening”, one for “fattened”, and one for “fatten” –  i.e. everything with the stem “fatten”.  It doesn't go all the way down to the root “fat”, and that's fine.&lt;/p&gt;

&lt;p&gt;When I search for “fatty”, all I get is the one entry for “fatty”, as you explained above.  That's fine too.&lt;/p&gt;

&lt;p&gt;We had been adding inferred glosses for the uninflected English stems and roots of attested glosses, e.g.&lt;/p&gt;

&lt;p&gt; &amp;lt;def&amp;gt;&lt;br /&gt;
              &amp;lt;seg&amp;gt;I am &amp;lt;gloss&amp;gt;fattening&amp;lt;/gloss&amp;gt; it up&amp;lt;/seg&amp;gt;&amp;lt;bibl corresp=&quot;psn:W&quot;&amp;gt;W10.138&amp;lt;/bibl&amp;gt;&lt;br /&gt;
              &amp;lt;seg&amp;gt;&amp;lt;gloss subtype=&quot;i&quot;&amp;gt;fatten&amp;lt;/gloss&amp;gt;&amp;lt;/seg&amp;gt;&amp;lt;bibl corresp=&quot;psn:ECH&quot;&amp;gt;ECH&amp;lt;/bibl&amp;gt;&lt;br /&gt;
              &amp;lt;seg&amp;gt;&amp;lt;gloss subtype=&quot;i&quot;&amp;gt;fat&amp;lt;/gloss&amp;gt;&amp;lt;/seg&amp;gt;&amp;lt;bibl corresp=&quot;psn:ECH&quot;&amp;gt;ECH&amp;lt;/bibl&amp;gt;&lt;br /&gt;
 &amp;lt;/def&amp;gt;&lt;/p&gt;


&lt;p&gt;Here, &amp;lt;gloss subtype=&quot;i&quot;&amp;gt;fatten&amp;lt;/gloss&amp;gt; adds nothing to the search capabilities, because the stemmer can find “fatten” within “fattening”.&lt;/p&gt;

&lt;p&gt;But does this entry with “fattening” get found when I search for “fat” because of the stemmer, or because of the &amp;lt;gloss subtype=&quot;i&quot;&amp;gt;fat&amp;lt;/gloss&amp;gt;?  It must be because of the inferred gloss, because the stemmer only stems as far as “fatten”.&lt;/p&gt;

&lt;p&gt;In the case of “fatty”, where we know the stemmer doesn't operate on it, it still gets found when I search for “fat” because of the &amp;lt;gloss subtype=&quot;i&quot;&amp;gt;fat.&lt;/p&gt;

&lt;p&gt;(“fattening” and “fatty” do NOT get found when I search for “fat” just because they contain the string f-a-t, because “fatal” and “father” are NOT found by a search for “fat”.  To find anything with the string f-a-t, I would need to search for “fat*”.)&lt;/p&gt;

&lt;p&gt;So the inferred glosses do play a role in improving the search.  That said, I don't think we should be going out of our way to add inferred glosses for this reason.&lt;/p&gt;














</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=more_on_inferred_glosses&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>Changes to gloss-tagging rules</title>
			    <description> (Mins: 50) &lt;p&gt;Much discussion over the last few weeks regarding the placing of gloss tags for generating the Eng-Nx wordlist.  I attempt to summarize our conclusions here for future reference.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;1) Why do we place inferred glosses (&amp;lt;gloss subtype=”i”&amp;gt;)?&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;At various times, we have placed inferred glosses for &lt;a href=&quot;http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;amp;p=10378&amp;amp;more=1&amp;amp;c=1&amp;amp;tb=1&amp;amp;pb=1&quot;&gt;augmenting the search engine on the website&lt;/a&gt;, and for generating the English word list.&lt;/p&gt;

&lt;p&gt;We concluded that from here on, we ONLY need to place gloss tags for generating the English word list.  Inferred glosses do sometimes enhance the web search engine, but now that the stemming analyzer is in place, we don't need to do any further markup to help it out.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;br /&gt;
&lt;b&gt;2) How should we tag inflected English words?&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Until last week, we had been inferring the root word (or stem where relevant) when a def is an inflected or derived form of an English word, e.g.&lt;/p&gt;

&lt;p&gt;&amp;lt;def&amp;gt;&lt;br /&gt;
&amp;lt;seg&amp;gt;he is &amp;lt;gloss&amp;gt;fattening&amp;lt;/gloss&amp;gt; it up&amp;lt;/seg&amp;gt;&lt;br /&gt;
&amp;lt;bibl corresp=“psn:JM”&amp;gt;JM 1.2.3&amp;lt;/bibl&amp;gt;	&lt;br /&gt;
&amp;lt;seg&amp;gt;&amp;lt;gloss subtype=“i”&amp;gt;fatten&amp;lt;/gloss&amp;gt;&amp;lt;/seg&amp;gt;&lt;br /&gt;
&amp;lt;bibl corresp=“psn:ECH”&amp;gt;ECH&amp;lt;/bibl&amp;gt;&lt;br /&gt;
&amp;lt;seg&amp;gt;&amp;lt;gloss subtype=“i”&amp;gt;fat&amp;lt;/gloss&amp;gt;&amp;lt;/seg&amp;gt;&lt;br /&gt;
&amp;lt;bibl corresp=“psn:ECH”&amp;gt;ECH&amp;lt;/bibl&amp;gt;		&lt;br /&gt;
&amp;lt;/def&amp;gt;&lt;/p&gt;

&lt;p&gt;This encoding means that this entry will show up three times in the English-Nxa’amxcin wordlist:  under &lt;b&gt;fat&lt;/b&gt;, under &lt;b&gt;fatten&lt;/b&gt;, and under &lt;b&gt;fattening&lt;/b&gt;.  This seems like overkill, especially when these three words will sort one after the other in the English wordlist anyway.&lt;/p&gt;

&lt;p&gt;ECH and SMK decided we would like to see the “fat”  entries as follows in the print dictionary:&lt;/p&gt;

&lt;p&gt;&lt;b&gt;fat&lt;/b&gt;: fat&lt;/p&gt;

&lt;p&gt;&lt;b&gt;fatten&lt;/b&gt;: fatten, fattened, fattening&lt;/p&gt;

&lt;p&gt;&lt;b&gt;fatty&lt;/b&gt;: fatty&lt;/p&gt;

&lt;p&gt;To accomplish this, we need to reduce the number of gloss tags we place in each entry.  Inflected English forms (-ed, -ing) should not be gloss tagged; only their root or stem should be gloss tagged.&lt;/p&gt;

&lt;p&gt;So “fattening” would now be gloss-tagged as: &lt;/p&gt;

&lt;p&gt;&amp;lt;seg&amp;gt;he is &amp;lt;gloss&amp;gt;fatten&amp;lt;/gloss&amp;gt;ing it up&amp;lt;/seg&amp;gt;&lt;/p&gt;


&lt;p&gt;MDH confirmed that the search engine is ignoring gloss tags, so the stemmer will operate on &amp;lt;gloss&amp;gt;fatten&amp;lt;/gloss&amp;gt;ing the same as it would on &amp;lt;gloss&amp;gt;fattening&amp;lt;/gloss&amp;gt;.  (That is, it will continue to return all results with the stem “fatten” when someone searches for fatten, fattened, or fattening.)&lt;/p&gt;

&lt;p&gt;MDH has created two sample Eng-Nx word lists based on the 6 files with “complete” status, one using all the gloss tags, and one omitting the inferred gloss tags.  They are in moses/trunk/docs/glosses.  We concluded that we don't want to programmatically ignore the inferred glosses, because many of them – especially the synonyms – are worth including.  But we can refer to these lists to identify the inflected English words whose gloss tags need to be revised.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;br /&gt;
&lt;b&gt;3) How should we tag English phrasal verbs?&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Where appropriate, English phrasal verbs will be enclosed in a single gloss tag - e.g, &amp;lt;gloss&amp;gt;go after&amp;lt;/gloss&amp;gt;.  This will allow us to organize the headwords in the Eng-Nx word list as follows:&lt;/p&gt;

&lt;p&gt;&lt;b&gt;go&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;go after&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;go down&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;go up&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;, etc.&lt;/p&gt;


&lt;p&gt;&lt;br /&gt;&lt;br /&gt;
&lt;b&gt;4) How can we distinguish English homophones in glosses?&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;English homophones in glosses will be distinguished with a secondary word (or phrase) in an @n attribute on the &amp;lt;gloss&amp;gt; tag, e.g.&amp;lt;gloss n=&quot;conflagration&quot;&amp;gt;fire&amp;lt;/gloss&amp;gt;, &amp;lt;gloss n=&quot;back of boat&quot;&amp;gt;stern&amp;lt;/gloss&amp;gt;.  These will then be rendered as follows in the print dictionary:&lt;/p&gt;

&lt;p&gt;&lt;b&gt;fire&lt;/b&gt; (conflagration):&lt;/p&gt;

&lt;p&gt;&lt;b&gt;stern&lt;/b&gt; (back of boat): &lt;/p&gt;

&lt;p&gt;We decided not to use parts of speech for @n values.  We will always use synonyms.  We need to select synonyms that will be clear to readers in the community.&lt;/p&gt;

&lt;p&gt;I have now disambiguated the English homophones listed &lt;a href=&quot;http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;amp;p=10379&amp;amp;more=1&amp;amp;c=1&amp;amp;tb=1&amp;amp;pb=1&quot;&gt;here&lt;/a&gt;, and updated the Notes on Definitions and Gloss Tagging document accordingly.  Where one homophone was far more common in the data than the other, I only added an @n value on the less common one - e.g. watch (wristwatch).&lt;/p&gt;
</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=changes_to_gloss_tagging_rules&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>Added @xml:lang attributes to names</title>
			    <description> (Mins: 90) &lt;p&gt;Did this through XSL with some cunning language-detection code based on content and context, and it seems to have worked pretty well. The Names page now uses the @xml:lang attribute instead of its own cruder detection code to build output.&lt;/p&gt;</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=added_xml_lang_attributes_to_names&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>Collapsed five slides to a single diagram</title>
			    <description> (Mins: 60) &lt;p&gt;As planned last week.&lt;/p&gt;</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=collapsed_five_slides_to_a_single_diagra&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>More work to be done on the presentation</title>
			    <description> (Mins: 90) &lt;p&gt;Meeting to review the presentation -- my task now is to collapse six slides which begin with the picture of the filecard box into a single stepped diagram illustrating the old encoding process and the horrible binary result.&lt;/p&gt;</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=more_work_to_be_done_on_the_presentation&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>Finished reworking and collapsing my part of the presentation</title>
			    <description> (Mins: 120) &lt;p&gt;Section 2 is now down to 6 slides, with more detail and more extensive notes.&lt;/p&gt;</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=finished_reworking_and_collapsing_my_par&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>Work on names list</title>
			    <description> (Mins: 90) &lt;p&gt;Following Sarah's post, I've done the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Added a language filter so you can view names only in English or Nxaʔamxcín. This is a crude regex, but it works because English names always begin with caps, and Nxaʔamxcín names never do.&lt;/li&gt;
  &lt;li&gt;Turned off the traffic light display in the names page.&lt;/li&gt;
  &lt;li&gt;Added more processing to the path, to handle rendering of e.g. choice elements inside names.&lt;/li&gt;
  &lt;li&gt;Excluded lexical suffix entries.&lt;/li&gt;
  &lt;li&gt;Elaborated the captions and links a bit.&lt;/li&gt;

&lt;/ul&gt;</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=work_on_names_list&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>changes for Names pages</title>
			    <description> (Mins: 0) &lt;p&gt;Here are a few requests for the Names page on the website:&lt;/p&gt;

&lt;p&gt;DONE -exclude Lexical Suffix entries&lt;/p&gt;

&lt;p&gt;DONE -fix the display of sic/corr, so that only “Wenatchi” displays, not “WenatcheeWenatchi” (See for example the entry for “Sam George”.)&lt;/p&gt;

&lt;p&gt;DONE -put flora (plants) and fauna (animals) in the link text at the top of the page&lt;/p&gt;

&lt;p&gt;-separate out the sorting into Nx-Eng and Eng-Nx pages.  Ideally, users should be able to view the complete list, or any of the six lists by name type, sorted either by Nxa'amxcin name or by English name.  The present setup with Nx and Eng names mixed together in the Name column is somewhat confusing.  Continue to sort the Nx-Eng lists based on name tags in prons.  For the present, exclude name tags in orths when generating these lists.  Sort the Eng-Nx lists based on name tags in defs.&lt;/p&gt;


&lt;p&gt;PENDING ECH'S FURTHER DISCUSSION WITH CCT:&lt;/p&gt;

&lt;p&gt;Please also generate a printable version of the six lists of names by type.  These only need to be sorted alphabetically by Nxa'amxcin name - i.e. only include the name tags within prons when generating these lists. Ideally they would be spreadsheets with the following columns:&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Name&lt;/b&gt; (pron:seg type= “p”)&lt;br /&gt;
&lt;b&gt;Source&lt;/b&gt; (following bibl ... if the pron:seg type= “p” is NOT subtype=“i”)&lt;br /&gt;
&lt;b&gt;Definition&lt;/b&gt; (all defs)&lt;br /&gt;
&lt;b&gt;Pronunciation&lt;/b&gt; (pron:seg type= “n”)&lt;br /&gt;
&lt;b&gt;Source&lt;/b&gt; (following bibl)&lt;br /&gt;
&lt;b&gt;Word Parts&lt;/b&gt; (hyph)&lt;/p&gt;



</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=changes_for_names_pages&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>Security re-established</title>
			    <description> (Mins: 90) &lt;p&gt;We've been running the live db with open access since the last time I rebuilt it, so in the process of doing other updates (such as rolling out the Java sorting collations) I've also added back the protection that we had before. In the process of doing this, I got bitten by the horrible eXist bug which enables you to lock yourself out of the admin account if you edit the admin user and forget to retype the password into the two password boxes (the effect is that you end up with a random admin password that you can never discover). As a result, I had to remove the server version of the app and replace it with a refreshed version of my local copy. This failed the first few times -- Tomcat tries to auto-deploy the app before it's completely uploaded the dbx files, so the uploaded .filepart files can not be renamed to overwrite the ones created by the live startup. It took two or three shots to get this problem solved. The only way seems to be to let it deploy, but stop it immediately in the Tomcat manager; then delete all the dbx, lock and log files; then upload them again; then restart it in the manager.&lt;/p&gt;</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=security_re_established&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  			  <item>
			    <title>print dictionary layout and web dictionary sort orders</title>
			    <description> (Mins: 5) &lt;p&gt;1) For the linguists' dictionary, we would like to see:&lt;/p&gt;

&lt;p&gt;first phonemic representation in bold  &amp;lt;orthography in angle brackets&amp;gt; [narrow transcription(s) in square brackets], for both forms and cits - e.g.:&lt;/p&gt;

&lt;p&gt;ʔáyx̣ʷt &amp;lt;ʔáyx̌ʷt&amp;gt; [ʔáyəx̣ʷt]&lt;br /&gt;
√ʔáyx̣ʷ-t&lt;br /&gt;
1. be tired&lt;br /&gt;
2. tired, worn out&lt;/p&gt;

&lt;p&gt;• √ʔáyx̣ʷ-tl kɬʔámnc&lt;br /&gt;
&amp;lt;√ʔáyx̌ʷ-tl kɬʔámnč&amp;gt; &lt;br /&gt;
 [√ʔáyəx̣ʷ-t ləkɬəʔámənč]&lt;br /&gt;
he is tired of waiting (for you / me)&lt;/p&gt;


&lt;p&gt;2) On the website, we would ultimately like things sorted by orthography.&lt;/p&gt;
</description>
			    <link>http://hcmc.uvic.ca/blogs/index.php?blog=10&amp;title=print_dictionary_layout_and_web_dictiona&amp;more=1&amp;c=1&amp;tb=1&amp;pb=1</link>
			  </item>
			  	</channel>
</rss>
