More on regularizations
Posted by mholmes on 11 Aug 2011 in Activity log
More research on the regularizations suggests that those which handle i/j and u/v variation are also unnecessary in the original transcription field. Using this XQuery, I was able to identify 573 choice elements which have 340 distinct variants:
xquery version "1.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; let $list := ( for $c in //choice[orig] where translate($c/orig, 'sſuvUVijIJ', '') = translate($c/reg, 'sſuvUVijIJ', '') return concat($c/orig, ' : ', $c/reg) ), $distList := distinct-values($list) return (concat(count($list), ' items, of which there are ', count($distList), ' distinct values: '), $distList)
If CC agrees that all these are irrelevant, it would be easy to remove them. That would leave 247 other regularizations, which mainly seem to consist of supplying missing spacing and apostrophes. These could presumably be kept.
This entry was posted by Martin and filed under Activity log.