Permalink 02:28:18 pm, by mholmes, 2 words, 6 views   English (CA)
Categories: G&T Hours; Mins. worked: 0

MDH: 247 - 1 = 246 hours G&T

Off early.

Permalink 02:27:47 pm, by mholmes, 42 words, 2 views   English (CA)
Categories: Activity log; Mins. worked: 120

Began work on XML schema

I've started mapping out XML structures for the properties and all their contents, and sent some queries to JS-R. Looks like a TEI extension with a lot of specialist elements and attributes is what we need, and I've started an ODD file.

Permalink 01:13:13 pm, by mholmes, 503 words, 13 views   English (CA)
Categories: Activity log; Mins. worked: 180

Hyphs and appendices: some decisions

Lots of work this morning on clarifying what we should do with hyphs. Here's the breakdown:

  • We should fix the delimiters in the source data, not in the output process. That means:
    -ʔ- becomes <ʔ>
    +a+ becomes <a>
    +C₂+ becomes <C₂>
    +CVC+ becomes <CVC>
    BUT ONLY (in the last two cases) where the same root morpheme appears before and after the sequence.
  • The string-replacement code I wrote yesterday to crudely accomplish this in the output should be removed, since the standard hyph output will now be correct anyway.
  • The deletions mentioned in SMK's post should be carried out by pre-processing the whole hyph before the "translated hyph" is created:
    • Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"), but only when they are infixes; you can now tell this context by the angle-bracket text nodes surrounding them. Note that there may be more than one infix separating the two roots (there are no instances of this right now, but there will be as more data is processed).
    • Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root (again, only where it is an infix, determined by surrounding text nodes).
  • The other part of SMK's post, relating to the situation where a root morpheme has no gloss, is now changed: we do not keep the other instance of the root, but instead we replace the unglossed root with a smallcapped label "Unk", signifying "unknown".

ECH also sends these instructions re changes to the indexes in the appendix, having changed the feature structures of clitics. I've implemented these:

Put the List of Root Morphemes first, and maybe change the headings as I've indicated here:
Four Appendices

1. List of Root Morphemes
         all roots (but not stems)
        (i.e. anything with <f name="baseType"> <symbol value="root"/></f>) 
2. List of Lexical Affixes (from lex-pref.xml and lex-suf.xml)
3. List of Grammatical Morphemes 
      -all grammatical affixes (those in the five affix xml files) plus inflectional clitics
      The inflectional clitics are defined as <f name="baseType">
                            <symbol value="clitic"/> AND <f name="cliticType">
                            <symbol value="inflectional"/>
4. List of Particles  
     -all particles (particles.xml),

It makes sense to have the List of Root Morphemes because this provides a different information than what is in the Root-based Index. The Root-based Index is a listing of all the words in the dictionary organized by root, and with morphological breakdowns; the List of Root Morphemes in the Appendix is simply a list of all the root morphemes. It is therefore a subset of the information in the Root-based Index, but in listing only morphemes it parallels the other 3 appendices which are lists of different categories of morphemes. So the Appendices will list all the individual morphemes in the dictionary.


Permalink 03:44:12 pm, by mholmes, 13 words, 3 views   English (CA)
Categories: Activity log; Mins. worked: 50

Expenses claim

Posting time spent on the Pro-D expenses claim for my TESOL conference attendance.

Permalink 03:43:31 pm, by mholmes, 6 words, 10 views   English (CA)
Categories: Activity log; Mins. worked: 60

TEI Journal work

Preparation for editorial meeting next week.

Permalink 03:34:58 pm, by mholmes, 40 words, 3 views   English (CA)
Categories: Activity log; Mins. worked: 30

Tested out the BlueJeans system

It seems that video conferencing will be done through BlueJeans, so SA and I tested this out; there's a simple deb for Linux, and it works fine in Firefox, with around a half-second delay between our desks. UVic supports it.

Permalink 03:20:17 pm, by mholmes, 597 words, 20 views   English (CA)
Categories: Activity log; Mins. worked: 120

Reduplications in hyphs

I'm working on SMK's instructions for hyphs here. I've implemented the first part, which is easy: it's just a search-and-replace on strings. But I'm struggling with the second part, mainly because I don't understand the examples properly. My questions are below; waiting for clarification from ECH.

-- when generating the translated hyph,

a) Delete the second/rightmost instance of the root after these morphemes: inchoative (xml:id="ʔ"), characteristic (xml:id="CHAR"), out of control (xml:id="OC"):

For example: [[√ʔiɬ<CVC>n-úl • √eat<char>-attrib]]

BUT, if the root has no gloss, DO keep the second part of the root:

For example: [[k-√cúwˀ<CVC>x=ánaʔ • loc-√cúwˀ<char>x=ear]]

b) Delete the first/leftmost instance of the root before the repetitive morpheme (xml:id="REP"), and put the root symbol before the second part of the root.

For example: [[√p<a>tix̣ʷ • <rep>√test]]

Again, if the root has no gloss, keep the first part of the root.

For example: [[√p<a>tix̣ʷ • √p<rep>tix̣ʷ]]

The first example comes from this (I'll pretty-print the hyph for clarity):

 <m corresp="m:ʔiɬn">ʔiɬ</m>
 <m corresp="m:CHAR">CVC</m>
 <m corresp="m:ʔiɬn">n</m>
 <m corresp="m:ul">úl</m>

Question 1: Can I ignore the intervening characters between the <m> elements for the purposes of detecting infixes? For instance, can I search for a sequence of:

<m>rootX</m> <m>CHAR</m> <m>rootX</m>

and be sure it's OK to delete the second root, regardless of what text nodes happen to intervene? Or might there be instances of, for instance,


where instead of + characters, there are hyphens, and the relationship is now entirely different so the deletion should not be triggered?

Question 2: I'm a bit confused about the idea of retaining the second root if it has no gloss. Why? The example comes from this hyph:

 <m corresp="m:k-LOC">k</m>
 <m corresp="m:cuwx">cúwˀ</m>
 <m corresp="m:CHAR">CVC</m>
 <m corresp="m:cuwx">x</m>
 <m corresp="m:anaʔ">ánaʔ</m>

and the entry xml:id="cuwx" is indeed lacking a gloss (it's an inferred entry). But if we delete reduplicated roots in most cases, but not in this one, aren't people going to assume that the second instance of the morpheme, which shows up as "x", is something else entirely, because they will assume that a second instance has already been deleted, as it would be in most normal cases? Are we expecting people to distinguish between a case where a root disappears because it has a gloss, and one where it doesn't disappear because it doesn't have a gloss? That seems extremely confusing to me. I would naturally assume that if reduplicated roots are normally deleted, that's the case here too, and the "x" is a subsequent and completely different morpheme (especially since it bears no resemblance to the first instance, "cúwˀ").


Permalink 04:47:19 pm, by mholmes, 11 words, 4 views   English (CA)
Categories: G&T Hours; Mins. worked: 0

MDH: 246 + 1 = 247 hours G&T

Trying to debug page-rendering issue in XSL:FO for Moses project.

Permalink 04:29:21 pm, by mholmes, 17 words, 3 views   English (CA)
Categories: Activity log; Mins. worked: 120

Upyerdb updates finished

We have excludes now tested and working, and I've created some more useful sets of update scripts.

Permalink 04:26:31 pm, by mholmes, 134 words, 21 views   English (CA)
Categories: Activity log; Mins. worked: 360

Work on PDF rendering


  • Fixed naming of particle index.
  • Split out lexical affix index, particle index and root index into separate page-sequences so they can have appropriate running headers.
  • Fixed some display spacing issues with translated hyphs (compensating for superfluous spaces in data).
  • Fixed a bug in xsl:key to look up glosses, so glosses are now appearing for lexical items that have them in translated hyphs.
  • Fixed a problem with page-masters for front matter and appendices (page-masters were not properly configured for recto and verso).
  • Fixed a blank-page bug (referenced master was not there, so page was unselectable in PDF output).
  • Began work on handling of various infixes (this will be very complicated).

Also, GN hacked our fonts to add subscript 1 and 2, since we need these, and the font author has not responded to our requests.

<< Previous Page :: Next Page >>

All HCMC Blogs




All HCMC Blogs

Transformer blog

Work on this blogging tool

Image Markup Tool blog

HCMC Project Management

Nxaʔamxcín (Moses) Dictionary Blog







Scandinavian-Canadian Studies



Image Markup and Presentation

Update of Humanities Sites


Vacation, Hours and Sickday Log

Times Colonist Transcript Database


CMC Research Collective


Humanities Project Showcase

Peter's blog



Professional Development

Colonial Despatches

Coup De Des - GUI for concrete poem

Capital Trials at the Old Bailey

Agenda Class Timetabling

Lansdowne Lectures

German Medical Exams

Canadian Mysteries

Map Of London


Canadian Journal of Buddhist Studies

Adaptive Database

Myths on Maps





History of the Philosophy of Language

A City Goes to War

Landscapes of Injustice

April 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

XML Feeds