Spent an hour working on the rather thorny XSLT to rationalize the entry structures. This is one of those jobs where the real needs only become apparent as you start working through the process and looking at the output from early code. SK and I are gradually figuring out what we need and how to do it.
In the tei_xml folder, I have added a file called qw-glot-test.xml.
It contains three entries copied from qw-glot.xml which should be good test cases for all the transformations we hope to be able to do.
We met this morning to discuss ways to relieve SK of some of her more tedious work during the editing process. We decided that some processes can be automated, and accomplished through XSLT on the files still awaiting editing. These are the details (from which I'll later write the XSLT):
- Where an
<entry>has multiple<form>elements:- Keep only the first one.
- Append the contents of each
<pron>in the subsequent<form>s to the<pron>in the first<form>.
- Where an
<entry>has multiple<sense>elements:- Copy the contents of #2ff to the end of #1.
- Delete #2ff.
- Delete any empty
<sense>elements. - In every
<quote>:- Start by adding this hard-coded content at the beginning:
<phr type="p" subtype="u"></phr><bibl>ECH</bibl>
- Next, wrap the first text node in a
<phr type="n">tag. - Append any
<bibl>which is a following-sibling of the parent<quote>. gloss[parent::quote]should be changed to<seg>.- Then find any asterisk in its content, and wrap a
<gloss>tag around that word, removing the asterisk. - Next, append any
<bibl>which is a following-sibling of the parent<quote>. - Output the rest as-is (should there be anything else?).
- Start by adding this hard-coded content at the beginning:
Finally, there needs to be a change to the server-side code, because "phonemic" and "narrow" as values of @type will be changed to "p" and "n" respectively, while @subtype will go from "unattested" to "u".
Sarah wrote:
I have been digging through the file boxes looking at the Particle file, and as far as I can see, most of the Particle file cards have NOT been entered. There is only Lexware printout for the first 3 file cards (plus one sentence off the 4th card), and this corresponds to affix-part.xml.
So we need to enter the rest of the Particle file directly from the cards.
-----------
Ewa wrote:
I have just been looking through the particle file cards, and doing some global searches for examples containing the particles. Based on a random sampling I think it is (almost) safe to say that all the dictegs in the particle file are already input as examples!
--------
We decided to proceed by inputting just the forms and defs for all the particles now - so that they all have entries with xml:ids.
Later, we will figure out some way to connect the particles to their relevant dictegs. It’s easy to find the dictegs by searching through the files all at once. So we will need to get Martin to help us connect the particles and the “found” examples.
-Use Find-and-Replace to globally remove the following comments:
<!--Form for the core entry-->
<!--Definition for the core entry-->
<!--Not yet edited-->
-For unattested entries for roots in isolation, check MDK's card to determine whether MDK or ECH added the root entry, and add an appropriate <note> to the entry.
-Replace any instances of ḥ (composed of h and COMBINING DOT BELOW) with ḥ (LATIN SMALL LETTER H WITH DOT BELOW).
-Replace any transcribed R's with h, ḥ, or ʕ as appropriate.
-Keep the angle brackets < > around glosses added by ECH - e.g.
frozen <freeze>
This differentiates meanings given by native speakers from glosses added by ECH.
-Leave clitics in main entries; do not make examples with clitics into dictegs.
-Explicitly mark up editorial decisions with <note>s. (See for example the notes on ḥaƛʼ-1 and ḥaƛʼ-2.)
I have now created entries for the following lexical suffixes in lex-suf-new.xml, as I could not find them in lex-suf.xml, lex-suf-nom.xml, or affix.xml
úlˀəxʷ (xml:id="ulˀəxW")
xn (xml:id="xn")
qín, qn (xml:id="qin")
ús "eye, face, fire, road" (xml:id="us")
WE NEED TO GO BACK THROUGH THE LEXICAL SUFFIX FILE CARDS, AS IT APPEARS NOT ALL OF THEM HAVE BEEN ENTERED!
We need a space between the <phr> and the <seg><gloss> in all dictegs.
Right now, dictegs in the affix file are displaying on the database site with the gloss jammed right up against the ] of the <phr>.
I will fix this by adding a space between </phr> and <seg> throughout the affix file, with find & replace.
The affix.xml file which has just been completed needed to be processed using the XSLT I'd previously written to convert the erroneous transcriptions of glottalizations resulting from our original conversion to the current system. This XSLT has been successfully run on all the other files in the system. However, when it was run on the affix.xml file, in oXygen it simply did nothing; the old transcriptions remained untransformed.
I could not figure out what the problem was. Running Saxon at the command line didn't help either. There may be something rather odd about that file, though. There are two symptoms of oddity: 1) when opening the file in oXygen, I got a warning about bidirectional features being turned off due to the file size (which may indicate that there's something oddly bidi about it, although searches for the bidi-change-trigger characters produced nothing); and 2) when copying and pasting the contents of the file from oXygen to Transformer running under Wine, only three characters were pasted: an x, a 5, and a control character, "device control 2", or Unicode u+0012. However, if you search for this character, it can't be found in the document.
In the end, I gave up and used Transformer to do a conversion using search-and-replace. There may be some problem simply with the size of the file (1.1 MB) -- it's larger than any of the others. But I'll keep an eye on that file.
None of the example sentences in dictegs in affix.xml have been phonemicized.
When/if Ewa adds these phonemicizations, she will also need to move the <bibls> for the attested examples, so that they refer to the narrow transcription and the gloss, not the whole <quote>.
For example, change this:
<cit>
<quote>
<phr type="narrow">kˀɬ‐√kʷan‐xʷ=cn</phr>
<seg><gloss>I answered him</gloss></seg>
</quote>
<bibl>JM3.57.3</bibl>
</cit>
To this:
<cit>
<quote>
<phr type="phonemic" subtype="unattested">kʼɬ‐√kʷan‐xʷ=cn</phr><bibl>ECH</bibl>
<phr type="narrow">kʼɬ‐√kʷan‐xʷ=cn</phr><bibl>JM3.57.3</bibl>
<seg><gloss>I answered him</gloss></seg><bibl>JM3.57.3</bibl>
</quote>
</cit>