Implemented a crude similarity metric in XQuery
Lucene-based fuzzy matching seems to be very broken in the build of eXist I'm using, and in any case it's based on Levenshtein distance, so I've implemented a crude version of the USM/NCD algorithm in XQuery. It's a long way from ideal, though, because it's using base64 versions of strings rather than compressing the actual strings (this is all I can do with eXist's exposed gzip access); using zip seems to be punitive because it would require creating a file on the filesystem or in the db and compressing that. I think a simpler approach would be to take my Java class and strip out all the command-line stuff it contains, then call that directly from XQuery (see the xqSearchUtils java project and the way it's called from the Despatches XQuery for an example). A jar file with a simple XQuery module interface might be very handy indeed.