Future structural changes to encoding
We've discussed some code structure issues outlined below, and agreed that they're desirable, but should be put off until later because they increase the quantity of code we'll have to edit. One approach is to write XSLT to create these changes, and make the changed versions of our XML files available through the website, while we edit the unchanged versions behind the scenes; then when the time is right, we can convert everything permanently. Here are the details:
I've been reading through LR's jTEI paper with a view to bringing our encoding more into alignment with the recommendations there (which should also make it more amenable to LMF-ication), and I think we should reorganize the way we're doing citations a bit. At the moment, we have this:
<cit>
<quote>
<phr type="p" subtype="i">s-√cə́s=lqs kˀʷáʔncás</phr>
<bibl corresp="psn:ECH">ECH</bibl>
<phr type="n">s-√cə́s=əlqs kˀʷáʔəncás</phr>
<bibl corresp="psn:JM psn:AM">Y14.219,220</bibl>
<seg>a mosquito bit me</seg>
<bibl corresp="psn:JM psn:AM">Y14.219,220</bibl>
</quote>
</cit>
In this, we rely on contiguity to associate each <bibl> with its preceding element, and we rely on <phr> and <seg> to distinguish original from translation. What we might do instead would look like this:
<cit>
<cit type="example">
<cit>
<quote xml:lang="col" type="p" subtype="i">
s-√cə́s=lqs kˀʷáʔnc
</quote>
<bibl corresp="psn:ECH">ECH</bibl>
</cit>
<cit>
<quote type="n">s-√cə́s=əlqs kˀʷáʔəncás</quote>
<bibl corresp="psn:JM psn:AM">Y14.219,220</bibl>
</cit>
</cit>
<cit type="translation">
<quote xml:lang="en">a mosquito bit me</quote>
<bibl corresp="psn:JM psn:AM">Y14.219,220</bibl>
</cit>
</cit>
This is much more detailed, but it makes more things explicit. It uses nested <cit> tags to ensure that each quote is bracketed with its <bibl>, and that each <quote> has the required @xml:lang setting. The second level of <cit> is divided into @type="example" and @type="translation" (following recommendations in the TEI Guidelines), and the @type and @subtype values are realized directly on <quote>, rather than requiring the use of <phr> or <seg>.
The obvious drawback is that there's more code here. Existing <cits> should be easy to convert to this framework with XSLT, though.
Similarly, we currently have things that look like this:
<pron> <seg type="p">hámp</seg> <bibl corresp="psn:J psn:MS">J3.72-74,78; MS1.53</bibl> <seg type="n">hə́mp</seg> <bibl corresp="psn:JM psn:AM">Y24.90; Y29.179; Y6.282</bibl> </pron>
where the association between <seg> and <bibl> again depends on sequence. I wonder if we might be better off with two <pron>s:
<pron type="p"> <seg>hámp</seg> <bibl corresp="psn:J psn:MS">J3.72-74,78; MS1.53</bibl> </pron> <pron type="n"> <seg>hə́mp</seg> <bibl corresp="psn:JM psn:AM">Y24.90; Y29.179; Y6.282</bibl> </pron>
where the @type attribute is applied to the <pron> element, and the <bibl> is unambiguously associated with the appropriate <pron>?
Again, it's a bit more code, but it seems a bit cleaner, and as I try to map our data onto the sorts of structures allowed by Lexus, it looks like this sort of approach will work better.