The new 3.0 has a bug with namespaces which can be worked around by refactoring a bit; since the refactoring actually produces better code, I've done it for several projects including Mariage. I've also reworked the search functionality so that it handles the problem case of a large document with hundreds of hits. Other layout and style bugfixes also done, and a couple of obvious things added to the stopword list.
Made a number of tweaks to the way the search currently works, but principally worked on generic code in the hcmc/xquery/xq-utils.xqm library to convert user-friendly search-box input into the XML syntax that eXist can use to talk to Lucene. This seems to be working well, although I haven't yet found a way to put it into practice because we're still using a string-construct-and-eval approach to filtered queries. It may be just a case of using the XQuery serialize() function.
As I hack away at search testing, I'm discovering more and more little tweaks that are more than nice-to-have. Today I fixed a bunch of bugs in processing of ambitious search strings (quoted phrases are not supported yet, although I have half-a-plan for that). I also decided that search-string highlighting in a document that you have found is better done using a much simpler search string than the one you used to find documents in the collection (for instance, you don't want minused terms in the document highlighter because it causes eXist to return nothing, for some reason). So I now have a clever conversion of the original search string that is appended to the URL of the document link in the initial search results.
I've also fixed the display of the gravures so that a search result link will pop up the containing annotation, and also so that a link to the id of an element which is not an annotation itself, but is inside one, will cause the annotation to be shown.
We're clearly down to minor tweaks at this stage, so we're close. PS is still working on a couple of cosmetic issues. I'm thinking that there should be some more sophisticated diagnostics to catch broken links; I don't think that check is currently finding links that point to an element in a document which is not one of the ref docs.
PS is working on the styling of the results page, and fixing a bug in scrolling of marginal page-numbers in normalized documents; I've fixed some other bits and pieces related to search, parameterized the build process so that I can easily build a full eXist XAR (1.4GB) locally without making Jenkins do it, and tested the big XAR on a local eXist (it works well). We're getting closer.
I think this was the last piece of the puzzle for the Mariage eXist app. I haven't yet tested building the complete webapp; I'll do that soon. Meanwhile, there's one issue regarding the display of the gravures that I'm working with PS on.
I've implemented search result caching in Mariage, and done a bunch more work to bring it up to speed with what I learned in the Graves project. However, I'm now faced with a problem in search design which also afflicts MoEML, summarized here:
Imagine you want to find "amour" in your documents. You search for "amour".
It finds (say) thirty documents which contains "amour". It returns the first ten (it's paging in sets of ten results), and it sets about giving you all the keyword-in-context display results for each document.
Now, the first document has 200 instances of "amour". So the search code has to do a kwic expand operation on all 200 of those results in order to give you 200 keyword-in-context fragments for that document. These operations take a long time, so it takes ages for your results to come back.
If your results page contains ten documents, each of which has 200 hits, you're now processing 2,000 hits to give a single page of results.
In the Graves project, this isn't an issue, because all the documents are tiny (one diary entry). But in Mariage and MoEML, we have a combination of very small (one poem, one little article) and very large (Satire Menipée, Stow) documents.
One option is that instead of returning all the hits for a document, you just return (say) the first five, with a note "195 more", and the option to search only that document. If you take that option, you see hits only from that document, but paged in sets of ten.
Another option is to treat the search as a search of the collection itself, so that every hit is a separate "result"; in that case, in our imaginary scenario, the first 200 hits (i.e. the first 20 pages of results) come from the first large document, and you have to get to page 21 before you see anything from the next document.
Another option is to search at the granularity of smaller fragments rather than full-scale documents (Stow chapters, etc.). The problem with that can be seen in this example, where search results from the same play are scattered around because each scene is searched as if it were a separate document.
I have a vague notion that you might let users search "FOR DOCUMENTS" (in which case they'd get summaries with the first one or two hits, with documents ordered by hit-count) or "IN DOCUMENTS" (in which case each individual hit in a document would be a separate "result" on the page. But I'm not sure how easy that would be for users to understand.
Fixed the bug where the docTitle was repeated at the beginning of introductory fragment documents. Also created an application icon for use in the XAR file.
Current solution tested and working on Chromium and Firefox.
As of today:
Lots of work still to do, including making search highlighting apply to document display, and making image search retrieve an image "fragment".
I erroneously gave CC the instruction to use pc/@force="weak" for hyphens that should be retained; and she misunderstood and added it to hyphens that should be dropped. This resulted in a bit of unnecessary encoding in a short text. I've fixed the text, revised the instructions and tweaked the XSLT handling, and with luck everything should work for the next text.
:: Next Page >>
Faut-il se marier? La question de Panurge s’avère incontournable en Occident, surtout à partir de la contre-réforme. Des débuts de la Concile de Trente en 1545 jusqu’à la fin du règne de Louis XIV, la tentative de renouveler le mariage se heurte en France à l’intervention croissante de la monarchie dans cette institution dominée auparavent par l’Église. La rencontre entre ces deux autorités fut tumultueuse mais propice au foisonnement des documents qui font l’objet de ce site : « l’imaginaire nuptial » se compose de divers genres textuels, chacun ayant son caractère propre, mais tous traitant des peurs, des désirs et des fantasmes de plus en plus visibles dans la société d’Ancien Régime grâce aux débats soulevés par la nouvelle problématique de l’union conjugale. L’accent pour le moment est sur les textes et images misogames qui font partie d’un renouveau de la Querelle des femmes pendant les 25 premières années du XVIIe siècle.
|<< <||> >>|