The structure of the utterances for the FrancoToile transcripts looks something like this:
<u>
Text of an utterance here...
</u>
<u>
Some more text here. <ref type="info">Special keyword <note>with an annotation</note></ref> here.
</u>
Each utterance is within a <u> element. Some of the words and phrases in utterances are marked by annotations. These special phrases are inside <ref> elements, and the annotations to go along with them inside <note> elements.
When searching the utterances for a keyword, I needed a way to exclude all text within <note> elements from the search, since they're not part of the actual utterance text. Martin and I spent a long (long!) time coming up with a good solution, but couldn't find anything satisfying. Then, thanks to Stack Overflow, I finally found found something that works:
//textNodeToSearch//text()[not(ancestor::note) and contains(., "searchTerm")]
Phew. This will search your text node (whatever you use for textNodeToSearch) for the search term but exclude all <note> elements from the search.
The complete XQuery used in the FrancoToile search, which orders the results by number of utterances found and also returns those utterances, is:
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace exist = "http://exist.sourceforge.net/NS/exist";
for $result in collection('francotoile/data')//tei:TEI[tei:text/tei:body//text()[not(ancestor::note) and contains(., "searchTerm")]]
return
<match>
{
let $articleBody := $result//tei:body
let $id := $result//tei:TEI/@xml:id
let $articleTitle := $result//tei:titleStmt/tei:title
let $timeline := $result//tei:TEI/tei:text/tei:body/tei:timeline
return
<info>
<title>{data($articleTitle)}</title>
<refid>{data($id)}</refid>
<count>{text:match-count($result)}</count>
<timeline>{data($timeline)}</timeline>
</info>
}
<utterances>
{
for $utter in $result//tei:u
let $start := $result//tei:timeline/tei:when[@xml:id=$utter/@start]/@absolute
let $end := $result//tei:timeline/tei:when[@xml:id=$utter/@end]/@absolute
where matches($utter, 'searchTerm')
return
<utterance>
<start>{data($start)}</start>
<end>{data($end)}</end>
<text>{data($utter)}</text>
</utterance>
}
</utterances>
</match>