Handling of unattested glosses
Unattested glosses, originally indicated with angle brackets, appear in two places: <def>s and <cit>s. Those in <def>s should be converted into something more suitable; once that's done, those in <cit>s can be deleted. This is the form (a complex instance):
<def> <!--Generated from: [ *clabber|ed milk <*sour>; *cottage~*cheese <*sour> ]--> <seg><gloss>clabbered</gloss> milk <<gloss>sour></gloss>; <gloss>cottage~*cheese</gloss> <<gloss>sour></gloss> </seg> <bibl corresp="psn:JM psn:AM">Y14.182</bibl> </def>
This needs to be converted such that the unattested gloss is lifted out of the context, and turned into a new <seg> with a <bibl> ascribing it to ECH:
<def>
<!--Generated from: [ *clabber|ed milk <*sour>; *cottage~*cheese <*sour> ]-->
<seg>
<gloss>clabbered</gloss> milk; <gloss>cottage~*cheese</gloss>
</seg>
<bibl corresp="psn:JM psn:AM">Y14.182</bibl>
<seg><gloss type="i">sour</gloss></seg><bibl corresp="psn:ECH">ECH</bibl>
</def>
Note that there are two instances of the same unattested gloss in the original, but we should have only one in the output, so I'm using distinct-values in the XSLT. Also note that the opening angle-bracket entity is outside the tag, but also needs to be removed. I've now written the XSLT for this, and I'll run it tomorrow morning.
Once that job is done, the only remaining unattested glosses will be in cits, and they can be commented out. You can find them with this regex:
(<<gloss>[^<]+<</gloss>)
and replace them with:
<!-- $1 -->
There are also instances of these things without gloss tags:
<seg> <gloss>clabber|ed</gloss> milk <*sour>; *cottage~*cheese <*sour> </seg>
Those can be matched with:
(<[^<]+<)