Handy XQuery for finding possibly-bad page links
Posted by mholmes on 01 Nov 2011 in Activity log
These two blocks of XQuery will search for page-image links in the header <biblScope>
and in <pb>
tags and report any that don't match the expected pattern. That doesn't mean they're bad, just that they need checking.
header <biblScope>s:
xquery version "1.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; for $b in //biblScope[@type="startPageImage"] let $bits := tokenize($b/@facs, "/") where not(starts-with($bits[2], $bits[1])) or not(matches($b/@facs, '((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}/((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}_[0-9]{5,5}[rv].jpg')) return (xs:string($b/ancestor::TEI/@xml:id), $b)
<pb>
tags in the body:
xquery version "1.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; for $pb in //pb[@n] let $bits := tokenize($pb/@n, "/") where not(starts-with($bits[2], $bits[1])) or not(matches($pb/@n, '((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}/((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}_[0-9]{5,5}[rv].jpg')) return (xs:string($pb/ancestor::TEI/@xml:id), $pb)