Permalink 10:52:18 am, by skell
Categories: Tasks

orth rules: word-initial i-insertion, removal of certain stress marks

I have been meaning to mention that we'll also need to orth the contents of <ref>s inside <xr>s.

1) Here is the rule for word-initial i-insertion in orthing:

Where "C" equals any of the following consonants: ʔ c c̣ cʼ h ḥ ḥʷ k kʼ kʷ kʼʷ l ḷ lˀ ḷˀ ɬ ƛʼ m mˀ n nˀ p pʼ q qʼ qʷ qʼʷ r rˀ s ṣ t tʼ w wˀ x xʷ x̣ x̣ʷ y yˀ ʕ ʕˀ ʕʷ ʕˀʷ.

and "V" equals any of the following vowels: á, a, í, i, ú, u, ə́, ə.

and a "word" in a phr is defined as a string preceded by > or whitespace, ignoring any morpheme delimiters,

IF a pron:seg type="p" begins with the string CyV,

OR IF a phr type="p" contains a word which begins with the string CyV,

OR IF a ref begins with the string CyV,

THEN add an i before the y when generating the orth.

I think this is pretty straightforward, although there are then some contexts we'll have to find, check and manually correct: CyV when not word-initial, other Cy and Cw sequences, word-initial šyC sequences, and auto-orthed šiyV sequences.

2) Also, here is the final rule re: removing more stress marks in orths: "If there is only one vowel in a word, remove all stress marks from the word."

Here's my attempt to state it as an algorithm:

Where "V" equals any of the following vowels: á, a, áa, aa, í, i, íi, ii, ú, u, úu, uu, ə́, ə, ə́ə, əə.

IF a pron:seg type="p" contains only one V

OR IF a word (delimited by > and whitespace, or whitespace and <, or whitespace and whitespace, ignoring any morpheme delimiters) within a phr:seg type="p" contains only one V

OR IF a ref contains only one V

THEN replace the following when generating the orth:

á with a
í with i
ú with u
ə́ with ə

Permalink 10:10:36 am, by skell
Categories: Announcements

Editing procedure after autohyphenation

Now that I have (almost) worked through the affix list and autohyphenated as many affixes as possible, I'm finding that editing the alphabetical files goes significantly more quickly. This is my editing procedure now:

-edit root entry

-autohyphenate all instances of that root in complex words in the file

-first pass: skim through all entries with that root and clean them up as best I can - mainly tagging any remaining morphemes and making sure gloss tags are placed properly in defs

-second pass: check that everything on the Lexware printout is present in the entries, and correct any autohyphing errors.

I could consider NOT even looking at the filecards from here on, to limit my obsessive need to proofread in triplicate. :-)


Permalink 02:33:29 pm, by mholmes
Categories: G&T Hours

MDH: 262 - 1 = 261 hours G&T

Leaving early.

Permalink 02:07:11 pm, by mholmes
Categories: Activity log

Vol 21 to print-on-demand; first vol 22 review encoded

Out with the old, in with the new. Vol 21 is officially done, and I've now encoded the first review for vol 22. There's one more already waiting for encoding.

Permalink 11:52:31 am, by mholmes
Categories: Activity log

File encoding for JSON gazetteer

We're experimenting with a JSON version of the gazetteer, and the biggest problem has been getting eXist to serialize it with the right character encoding in the header. After a lot of different approaches were tried, this is what worked in the end (snippet from controller.xql):

else if (text:matches-regex($exist:path, '/*\.json')) then
    <dispatch xmlns="http://exist.sourceforge.net/NS/exist">
    	<forward absolute="yes" url="/rest/db/generated/{substring-before($exist:resource, '.json')}.json">
    	 <set-header name="Content-Type" value="application/json; charset=utf-8"/>

This is bound to crop up again, so I'm blogging it.


Permalink 04:26:54 pm, by mholmes
Categories: Activity log

Progress on transformation scenarios

Spent the day learning the nitty-gritty of creating complex scenarios for building various things, using Ant. More documentation when I have time. This is very useful stuff once you figure it out.


Permalink 05:24:15 pm, by mholmes
Categories: G&T Hours

MDH: 261 + 1 = 262 hours G&T

On late duty.

Permalink 04:57:45 pm, by mholmes
Categories: Activity log

Fixed a rendering bug

Places with no Agas Map link were not showing any other geo or linked-data information. I've now uncoupled the rendering of those discrete sets of data in general.xsl.

Permalink 04:56:40 pm, by mholmes
Categories: Activity log

Gazetteer output in JSON

Playing around with a test phone app at home, I'm working with the gazetteer data and I think it would be useful (for many other reasons too) to make gazetteer data available in JSON format, so I've added that output to the utility XSLT that generates the regular xml version and added the result to the repo and to eXist.

Permalink 04:41:49 pm, by mholmes
Categories: Activity log

TEI Journal: working on images...

...for an article in Issue 7.

