Myths on Maps

  • Log in
  • « Deployed and updated
  • Task for someone »

kwic/lucene problems resolved

Posted by gregster on 17 Jan 2012 in Activity log

Take this paragraph:

<p>Some text in a paragraph
<note>blah blah note blah blah</note>
about something important.
</p>

and a lucene index which includes:

<ignore qname="tei:note"/>

My assumption was that, when using kwic, I would not get back any text inside of the note tag because I had excluded notes from my index but this is not the case. Searches using kwic consistently returned text from inside of the note element. As far as I can tell this is because kwic:expand expands the entire root node (in this case <p>), which includes the <note> THEN turns it in to plain text.

The result I was looking for was:
"..paragraph about something important"
but what I got was:
"...note blah blah something important"

Fortunately, KWIC is written in XQuery, so I re-created kwic and edited the get-context, truncate-previous and truncate-following functions to ignore notes using [not(ancestor::tei:note)]

I'm now using my GKWIC module to only print text nodes that are *not* ancestors of a tei:note element.

This entry was posted by Greg and filed under Activity log.

Myths on Maps

This project will focus on deploying an interactive map of Europe with overlays for Greek and Roman myths, history, people and events.
SVN instructions for MoM editors
Development URL
HCMC Blogs home
  • Archives
  • Categories

Search

XML Feeds

  • Atom: Posts
  • RSS 2.0: Posts
More on RSS

This collection ©2023 by admin • Help • b2