Nxaʔamxcín (Moses) Dictionary Blog

May 2, 2013

Duplicate @xml:ids

Posted by on 02 May 2013 in Activity log

The problem of duplicate @xml:id attributes on entries has now become a serious issue for the print dictionary building, because I'm unable to properly process the entire collection properly to produce the book; to build the dictionary I have to use XInclude to create a single XML source file, and when I do that there are over 1600 duplicate ids which prevent some of the processing steps from being successful.

I've taken a quick look at where the duplicates tend to be concentrated, by adding the files in alphabetical order and looking to see how many duplicates occur with each addition. These files create no problems (i.e. they have no duplicates among themselves):

affix_glot-ix.xml
affix_k-m.xml
affix_n-t.xml
affix_u-CAPS.xml
c.xml
c-glot.xml
c-rtr.xml
glottal.xml
h.xml
h-phar-part1.xml
h-phar-part2.xml
l-affric.xml
lex-suff.xml
new-data-2013.xml
p-glot.xml
phar-w.xml
qw-glot.xml
s-rtr.xml
t-glot.xml
xw.xml

When I add the remaining files, one by one (and only one at a time), these are the results:

k.xml            100 duplicates.
k-glot.xml:         18
kw.xml:              2
kw-glot.xml:          2
l.xml:              3
l-fric.xml:          6
m.xml:              3
n.xml:             97
p.xml:              7
particles.xml:          4
pron.xml:          2
q.xml:              4
q-glot.xml:          3
qw.xml:              1
rescued.xml:         54
s.xml:              2
t.xml:             20
ww-glot.xml:          4
x.xml:              3
x-uvul.xml:          4
yy-glot.xml:          4

What I'm going to do is develop the dictionary output using only the valid files, and then add the others in as they get fixed. In the meantime, it might be worth having a go at some of the low-hanging fruit (the ones with only two or three duplicates). More will show up as we add those in, of course -- there will be duplicates across the currently-excluded files as well as those that they share with the "good" files. So the dictionary PDFs will shrink in size, but I'll be able to start doing things like generating page-references that depend on xml:ids.

April 26, 2013

Another collation, and a fork of the dictionary output

Posted by on 26 Apr 2013 in Activity log

I've created a new MosesPhonemicCollation jar for sorting based on the phonemic representations. I've also forked the dictionary build process based on a parameter called "dictionaryType", which can be "learner" or "linguist". The former produces a dictionary based on the orthography, sorted with the MosesOrthographyCollation, and the latter produces one based on the phonemic transcriptions, with the new collation. The "alphabet" guides that run across the bottoms of pages are also appropriately different. I've abstracted the front matter into a separate file, and I'm auto-including the personography, although I'm not processing it yet.

April 25, 2013

More collation work

Posted by on 25 Apr 2013 in Activity log

The idea of having a single collation to sort everything in our db is now impractical, because the orthographical sorting rules clash with the transcriptional sorting rules, so I've created a new, simpler MosesOrthographyCollation class for sorting the orthography only. It's working well, but there are still some outstanding questions about it. In the meantime, we can't update the website because we don't have orthographies there yet, so this is only going to be used in for the print dictionary generation.

This has had to be redone a couple of times due to changes in the list of glyphs, but it's working now and tested with the print dictionary system.

April 24, 2013

Alphabetical order

Posted by on 24 Apr 2013 in Activity log

We concluded that we need a different alphabetical order for the community dictionary vs. the linguists' dictionary.

The community dictionary should indeed follow the order in the 2006 language program dictionary - that is:

a aa ə əə č c cʼ h ḥ ḥʷ i ii k kʼ kʷ kʼʷ l lʼ ll llʼ ɬ ƛʼ m mʼ n nʼ p pʼ q qʼ qʷ qʼʷ r rʼ š s t tʼ u uu w wʼ x xʷ x̌ x̌ʷ y yʼ ʕ ʕʼ ʕʷ ʕʼʷ ʔ

The linguists' dictionary should follow the order in MDK's 1981 dictionary:

ʔ a ạ c c̣ cʼ ə ə̣ h ḥ ḥʷ i ị k kʼ kʷ kʼʷ l ḷ lˀ ḷˀ ɬ ƛʼ m mˀ n nˀ p pʼ q qʼ qʷ qʼʷ r rˀ s ṣ t tʼ u ụ w wˀ x xʷ x̣ x̣ʷ y yˀ ʕ ʕˀ ʕʷ ʕˀʷ

Progress with the print dictionary

Posted by on 24 Apr 2013 in Activity log

Today I got the following bits working:

Running headers showing first and last items on dictionary pages.
Front matter with independent numbering in roman numerals.
Footers on dictionary pages showing alphabetical order (the order itself isn't yet finalized).
Correct alignment and page-numbering for recto/verso pages (I'd screwed up on that).

Next is the implementation of the English-Moses glossary.

April 23, 2013

Working on the paper

Posted by on 23 Apr 2013 in Activity log

Proofing and rewriting. I've now finished section 4. The conclusion remains to be written, and the intro will have to be reworked at the end.

April 22, 2013

Proofing of the article

Posted by on 22 Apr 2013 in Activity log

Worked through sections 1 and 2, and into 3, merging previous changes and suggesting more.

April 17, 2013

More work on print dictionary

Posted by on 17 Apr 2013 in Activity log

I now have some basic rendering for definitions and examples, so we're getting closer to something that looks like the final product will look. There are two chars missing from the fonts, but the author of the fonts may be able to add them for us (yay!). Other than that, things are looking good.

April 16, 2013

Aboriginal Serif is fixed!

Posted by on 16 Apr 2013 in Activity log

The author of the Aboriginal fonts kindly fixed the r-with-caron rendering problem we'd identified, within an hour of our reporting it, so it looks like that will be our font of choice. Tested and working now.

April 8, 2013

Rendering fix-ups for PDF

Posted by on 08 Apr 2013 in Activity log

After looking with SMK at the way some characters are rendering in the PDF, I've created a function called hcmc:renderingFixups() which does some character substitution. Specifically, we're replacing i-with-dot-below + combining accent aigu or grave with i-with-accent plus combining dot below; and a similar thing with i-with-short-stroke. This last may not be as pretty as we hope, and there are still outstanding problems with the dot below m (off centre to right) and l (off to the left). We may be able to fix the latter by flipping briefly to another font, but that's very ugly. Still, these are minor issues, and so far CharisSil is working well for us.

Nxaʔamxcín (Moses) Dictionary Blog

This is an XML dictionary project based primarily on the materials compiled by the late M. Dale Kinkade during fifteen years of work in the 1960’s and 1970’s with more than a dozen native speakers of the language, but it also includes materials compiled by Ewa Czaykowska-Higgins in the early 1990’s.

Search

XML Feeds

RSS 2.0: Posts
Atom: Posts

What is RSS?

Sidebar 2

This is the "Sidebar 2" container. You can place any widget you like in here. In the evo toolbar at the top of this page, select "Customize", then "Blog Widgets".