XQuery (1.0) code for finding text in a node while ignoring specific descendents
The structure of the utterances for the FrancoToile transcripts looks something like this:
Text of an utterance here...
Some more text here. <ref type="info">Special keyword <note>with an annotation</note></ref> here.
Each utterance is within a <u> element. Some of the words and phrases in utterances are marked by annotations. These special phrases are inside <ref> elements, and the annotations to go along with them inside <note> elements.
When searching the utterances for a keyword, I needed a way to exclude all text within <note> elements from the search, since they're not part of the actual utterance text. Martin and I spent a long (long!) time coming up with a good solution, but couldn't find anything satisfying. Then, thanks to Stack Overflow, I finally found found something that works:
//textNodeToSearch//text()[not(ancestor::note) and contains(., "searchTerm")]
Phew. This will search your text node (whatever you use for textNodeToSearch) for the search term but exclude all <note> elements from the search.
The complete XQuery used in the FrancoToile search, which orders the results by number of utterances found and also returns those utterances, is:
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace exist = "http://exist.sourceforge.net/NS/exist";
for $result in collection('francotoile/data')//tei:TEI[tei:text/tei:body//text()[not(ancestor::note) and contains(., "searchTerm")]]
let $articleBody := $result//tei:body
let $id := $result//tei:TEI/@xml:id
let $articleTitle := $result//tei:titleStmt/tei:title
let $timeline := $result//tei:TEI/tei:text/tei:body/tei:timeline
for $utter in $result//tei:u
let $start := $result//tei:timeline/tei:when[@xml:id=$utter/@start]/@absolute
let $end := $result//tei:timeline/tei:when[@xml:id=$utter/@end]/@absolute
where matches($utter, 'searchTerm')