kwic/lucene problems resolved
Take this paragraph:
<p>Some text in a paragraph <note>blah blah note blah blah</note> about something important. </p>
and a lucene index which includes:
<ignore qname="tei:note"/>
My assumption was that, when using kwic, I would not get back any text inside of the note tag because I had excluded notes from my index but this is not the case. Searches using kwic consistently returned text from inside of the note element. As far as I can tell this is because kwic:expand expands the entire root node (in this case <p>), which includes the <note> THEN turns it in to plain text.
The result I was looking for was:
"..paragraph about something important"
but what I got was:
"...note blah blah something important"
Fortunately, KWIC is written in XQuery, so I re-created kwic and edited the get-context, truncate-previous and truncate-following functions to ignore notes using [not(ancestor::tei:note)]
I'm now using my GKWIC module to only print text nodes that are *not* ancestors of a tei:note element.