Summarizing transcription data changes
Posted by mholmes on 16 Feb 2010 in Activity log
This is a summary of the global changes we'll be making to the XML data, based on a re-reading of all the relevant posts from 2007:
- The Glottalized Ejective class has the following members:
p’, t’, c’, ƛ’, k’, q’ - The Glottalized Sonorant/Resonant class has these members:
mˀ, nˀ, lˀ, ḷˀ, rˀ, wˀ, yˀ, ʕˀ - The former are currently transcribed in the already-processed files using raised glottals (e.g. tˀ). The raised glottal is U+02c0. These need to be transformed into U+02bc: MODIFIER LETTER APOSTROPHE, "glottal stop, glottalization, ejective". This letter is valid as part of an xml:id attribute, so we could do a global conversion there, using Transformer rather than an XSLT identity transform.
- However, in the partially-transformed files, it appears that all of these items have been transcribed using actual apostrophes. This means we can't use Transformer, because there are valid English sequences containing e.g. t+apostrophe; only in the context of the TEI tags which contain Moses script should the conversions take place. Therefore we will have to use an XSLT identity transform to accomplish this conversion.
The plan, therefore, is this:
- For the completed and in process files, the only conversion I think we need to make is to convert Ejective + raised glottal to Ejective + U+02bc. This can be done universally, using Transformer.
- For the partially-transformed files, we need to write an XSLT identity transform which targets a specific list of only those TEI tags which contain Moses script. The transform will map Ejectives + apostrophe to Ej. + U+02bc (modifier letter apostrophe), and will map Sonorant/Resonant + apostrophe to SR + U+02c0 (raised glottal).