kwic/lucene problems resolved

17/01/12

Permalink 03:01:37 pm, by Greg, 187 words, 165 views   English (CA)
Categories: Activity log; Mins. worked: 120

kwic/lucene problems resolved

Take this paragraph:

<p>Some text in a paragraph
<note>blah blah note blah blah</note>
about something important.
</p>

and a lucene index which includes:

<ignore qname="tei:note"/>

My assumption was that, when using kwic, I would not get back any text inside of the note tag because I had excluded notes from my index but this is not the case. Searches using kwic consistently returned text from inside of the note element. As far as I can tell this is because kwic:expand expands the entire root node (in this case <p>), which includes the <note> THEN turns it in to plain text.

The result I was looking for was:
"..paragraph about something important"
but what I got was:
"...note blah blah something important"

Fortunately, KWIC is written in XQuery, so I re-created kwic and edited the get-context, truncate-previous and truncate-following functions to ignore notes using [not(ancestor::tei:note)]

I'm now using my GKWIC module to only print text nodes that are *not* ancestors of a tei:note element.

Pingbacks:

No Pingbacks for this post yet...

Myths on Maps

This project will focus on deploying an interactive map of Europe with overlays for Greek and Roman myths, history, people and events. Development URL: http://tomcat-devel.hcmc.uvic.ca:8080/myths/apps/mom

Reports

Categories

April 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

XML Feeds