Automated detection take 2
Posted by mholmes on 10 Oct 2012 in Activity log
It turns out that matching against only <seg type="n">
misses most of the matches, because entries that haven't yet been processed don't have type attributes. Thus take 2, which finds 670 matches:
declare default element namespace "http://www.tei-c.org/ns/1.0"; import module namespace util="http://exist-db.org/xquery/util"; for $p in collection('/db/moses/')//TEI[@xml:id='lex-suff']//phr[@type='n'][parent::quote] let $target := normalize-space(translate($p/text()[1], '+-=√‐', '')), $matches := collection('/db/moses/')//pron/seg[text() = $target] return if (count($matches) gt 0) then concat('*** ', $p/ancestor::entry/@xml:id, ' (', $target, ') [', $p/text(), '] matches ', collection('/db/moses/')//pron/seg[text() = $target][1]/ancestor::entry/@xml:id) else concat(' ', $p/ancestor::entry/@xml:id, ' (', $target, ') [', $p/text(), '] has no matches. ')