Nxaʔamxcín (Moses) Dictionary Blog

February 20, 2014

PDF entry rewrite done

Posted by on 20 Feb 2014 in Activity log

This took a bit longer than I expected because of some problems encountered with missing data, but I think I've completed all the instructions SMK provided on Tuesday night for the new dictionary entry format. Generally I think it looks pretty good. Some notes:

The test PDF, which runs to 185 pages right now, is built using the same test set that SMK and I have been using for developing the root-based index -- the full list of included files is below. It's basically all the completed entry files along with any incomplete files containing morphemes required by items in the completed files.
"Name" entries have been excluded from the dictionary. At present, this means all entries which have the "name" feature set to true. This is too crude, because it will also include entries for flora and fauna, but it looks as though the feature structures will need to be made a bit more sophisticated to allow us to exclude people's names more reliably and keep the other ones.
I've set it up so that it automatically generates orthographical forms where required, based on the phonemic pron, and it also sorts based on these forms. This saves having to preprocess all the files to add orths before generating the dictionary. If an orth already exists, it will use it (so when you we get around to adding orths, they'll be used in place of the generated ones).
Where "orth?" appears in the middle of an entry, it's from a quotation which has no phonemic <phr>, so there's nothing to generate an orth from. There are 2,204 of these in lex-suf.xml alone. Perhaps auto-phonemicization can help here.
There are problems with cross-references which contain refs pointing at entries which are excluded from the dictionary (see #2), so any such cross-references are ignored. This means that some legitimate cross-references are excluded because they share an <xr> tag with unusable ones.
I think that enclosing the two versions of the hyph (the regular hyph and the "translated" hyph) in the same set of paired square brackets is a bit confusing; I think there needs to be a more obvious delimiter between them, or perhaps they should be bracketed separately:
```
[[c-ka-√ƛʼaʔá-s-n c-kas-√ƛʼaʔa-stu-(3Obj, 1SgSubj)]]

should be something like:

[[c-ka-√ƛʼaʔá-s-n ◆ c-kas-√ƛʼaʔa-stu-(3Obj, 1SgSubj)]]

or

[[c-ka-√ƛʼaʔá-s-n]] [[c-kas-√ƛʼaʔa-stu-(3Obj, 1SgSubj)]]
```
(I do think the translated hyph is a great idea though.)

Included files:

personography.xml
c-rtr.xml
kw.xml
h-phar-part1.xml
affix_aspectual.xml
affix_k-m.xml
affix_u-CAPS.xml
affix_glot-ix.xml
l-affric.xml
h-phar-part2.xml
phar-w.xml
h.xml
glottal.xml
s-rtr.xml
affix_n-t.xml
kw-glot.xml
qw-glot.xml
lex-suf.xml
lex-pref.xml
pron.xml
particles.xml

February 19, 2014

Ordering change done; beginning of rewrite of entry rendering in PDF

Posted by on 19 Feb 2014 in Activity log

Did the last tweak to the root-based-index sorting system per SMK, then started the rewrite of the rendering of entries in PDF, per SMK's guide. I've built orth-generation into the template, rather than having to work with pre-generated special versions of the data. It's coming along; another couple of hours' work and I think it'll be done.

February 18, 2014

Another ordering change to be done

Posted by on 18 Feb 2014 in Tasks

From SMK:

A couple of weeks ago, we "downgraded" the clitics in the sort by preceding any clitic values with "0000_" when generating the sort key, and then stripping the added bit out again before calculating the indent levels.

I think we need to precede the aspectual affixes with 0000_ ... and then downgrade the clitics further by preceding them with 0000_0000_.

These are the aspectual affixes:

s-IMPF
ʔas
ʔac
sac
kaɬ-PR
kas
mix

And these are the clitics:

kn
kW
kp
kt
lx

February 17, 2014

"Translated hyph" working

Posted by on 17 Feb 2014 in Activity log

I have a working setup which translates each morpheme in a hyph into either its label element (if the original entry has one), or failing that the first gloss element in its entry, or failing that its id (which may not be ideal). I'm using xsl:key for this so it's quick. I should set up keys instead of some of the other variables I'm using, probably.

generating the "translated hyph"

Posted by on 17 Feb 2014 in Activity log

Following Montler 2012, we want to be able to include a "translation" or "gloss" of words' hyphs, in both the main alphabetical dictionary entries, and the root based index.

For a hyph like: √ḥac=mín

The translation would look like: √tie=nominalizer

Here is an initial attempt at explaining how to programmatically generate these "translations":

-Replace a root, lexical prefix, or lexical suffix with the first <gloss> in its entry. (Question for ECH: what if the root has no <gloss>, as in the many "Meaning unclear" inferred roots?)

-Replace an affix (including those in the 5 affix files, and pron.xml) with a <label> to be added by ECH and SMK, based on MLW 2003.

February 12, 2014

formatting for morpheme index

Posted by on 12 Feb 2014 in Activity log, Tasks

Here are some requests for formatting changes to the morpheme index, per yesterday's discussion with ECH.

-if possible, nothing should be indented more than 3 levels.

-make the indents larger, so it's easier to see the different levels

-remove the group headings, and the spaces between groups

-bold the plain root/stem

-include the following on each line:
-Nxa'amxcin word (orth), size 12 font
-first English def (not gloss), size 10 font
-hyph and "translated hyph" in size 8 font
-if possible, page reference to this word's full entry in the main alphabetical dictionary

More to follow re: generating the "translated hyph"!

February 5, 2014

More minor tweaks...

Posted by on 05 Feb 2014 in Activity log

From suggestions from SMK.

February 4, 2014

We are really getting there!

Posted by on 04 Feb 2014 in Activity log

The new combined group-then-sort approach is actually working. There was one last tweak we had to implement: the sort force of clitics is problematic because while they have a specific weight in the numerical sort sequence, which is used to determine the order in which morphemes are listed in the sort key, once that order has been determined, they need to be "downgraded" during the actual sort. We've achieved this by massaging the generated sort key to precede and clitic values with "0000_", which ensures they actually sort before any other sequences with non-clitics in the same position where the preceding part of the sequence is identical, but then stripping the added bit again before we calculate the indent levels.

So this is what we're now doing:

Creating groups of subforms of the root by extracting them in a specific order from all the related forms, excluding all previously-extracted ones so that each form appears only once under each root;
Rendering the subgroups in a different order from the discovery order;
Sorting the items within each subgroup based on a generated sort key which gives a numerical weight to each morpheme discovered working outwards from the root (after stripping duplicated roots in a careful manner which differs depending on the infix separating them);
Generating an indent level for each item in the subgroup based on comparing the lengths of their sort keys after stripping the common component from the left side of each.

Can this be it? Looks like it might be.

February 3, 2014

New numerical sort working but...

Posted by on 03 Feb 2014 in Activity log

The new numerical sort is doing what it's supposed to do, and I've now added handling for the infixes with repeated roots, but it turns out that we still need to do the initial grouping of forms first, and then use this approach as a secondary sort within the groups. I've also coded an additional pass through an entry sequence which calculates first relative and then absolute values for indents, based on eliminating the longest common initial subsequence and comparing what's left, but whether that will prove to be useful or not remains to be seen. Tomorrow I'll start integrating the two approaches (old grouping and new sorting within groups, except for compounds which already have their own subsort).

handling infixes in the Plan B sort

Posted by on 03 Feb 2014 in Activity log

If there are two instances of the same root's xml:id in a word's hyph, it's because the root morpheme is split up by an infix. These infixes need to be handled as follows:

-inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"): delete the second/rightmost instance of the root. (That is, treat these infixes like suffixes.)

-repetitive (xml:id="REP"): delete the first/leftmost instance of the root. (That is, treat this infix like a prefix.)

Nxaʔamxcín (Moses) Dictionary Blog

This is an XML dictionary project based primarily on the materials compiled by the late M. Dale Kinkade during fifteen years of work in the 1960’s and 1970’s with more than a dozen native speakers of the language, but it also includes materials compiled by Ewa Czaykowska-Higgins in the early 1990’s.

Search

XML Feeds

RSS 2.0: Posts
Atom: Posts

What is RSS?

Sidebar 2

This is the "Sidebar 2" container. You can place any widget you like in here. In the evo toolbar at the top of this page, select "Customize", then "Blog Widgets".