CC rewrote the About page, which prompted me to fix those site pages in their XML format rather than continuing to maintain them in HTML. The whole dataset is now TEI.
There were some fallout issues from the footnote stuff yesterday, so I've fixed some rogue notes which should have been other things, and tweaked the XSLT. When I went to update last night, the process failed because we were out of disk space; jetty-instances had only 30GB. RE has now doubled that, and I was able to update Mariage with no problem.
Fixed a number of things today:
- Footnotes are now working even when embedded in split lists. I've added an earlier step which puts their number in @n.
- The "Tous" TOC now includes gravures, which it didn't before.
- The dates in TOCs correctly reflect the certainty and ranges of unknown dates.
- XML files are now displaying.
- The X-Frame-Option header is set to DENY.
There is a notorious problem in converting TEI lists to HTML, whereby if there are embedded things (such as formeworks or page breaks) that will result in element content, you have to split out the list into separate component lists to avoid embedding non-list element content, resulting in invalid HTML. I had some code that was supposed to be doing this with for-each-group, but it wasn't working. I debugged and fixed it, so lists are now coming out OK, but there's now a problem with notes embedded in these structures, because they're being processed as part of a constructed fragment; that means they lose their context, and end up being numbered "1" and not generating a popup correctly. This is exemplified in the TOC for Forest Nuptiale. Possible solutions:
- Do a first pass to pre-process notes to give them an @n attribute, then do the note processing based on that attribute rather than on counting preceding notes.
- Do a first pass to split out the lists in XML, so that the problem doesn't arise.
- Instead of using the very limited ol/ul/li elements in XHTML5, use instead simple divs with display: list and display: list-item.
I'm still thinking on this.
CC pointed out a number of flaws in the way both primary source and normalized versions are being rendered. The previous site had an assumption that title page contents were centred; we want to make that explicit in texts, but then handle it, so I've added a handler for the titlePage element. Where possible, flow content in paras in normalized texts should be justified, so I've made that happen by adding a class on the root div which enables us to apply override styles for normalization display. I fixed some encoding errors in a couple of texts, and I've also tweaked a bunch of the CSS. We're getting closer to a publishable version now.
The new 3.0 has a bug with namespaces which can be worked around by refactoring a bit; since the refactoring actually produces better code, I've done it for several projects including Mariage. I've also reworked the search functionality so that it handles the problem case of a large document with hundreds of hits. Other layout and style bugfixes also done, and a couple of obvious things added to the stopword list.
Made a number of tweaks to the way the search currently works, but principally worked on generic code in the hcmc/xquery/xq-utils.xqm library to convert user-friendly search-box input into the XML syntax that eXist can use to talk to Lucene. This seems to be working well, although I haven't yet found a way to put it into practice because we're still using a string-construct-and-eval approach to filtered queries. It may be just a case of using the XQuery serialize() function.
As I hack away at search testing, I'm discovering more and more little tweaks that are more than nice-to-have. Today I fixed a bunch of bugs in processing of ambitious search strings (quoted phrases are not supported yet, although I have half-a-plan for that). I also decided that search-string highlighting in a document that you have found is better done using a much simpler search string than the one you used to find documents in the collection (for instance, you don't want minused terms in the document highlighter because it causes eXist to return nothing, for some reason). So I now have a clever conversion of the original search string that is appended to the URL of the document link in the initial search results.
I've also fixed the display of the gravures so that a search result link will pop up the containing annotation, and also so that a link to the id of an element which is not an annotation itself, but is inside one, will cause the annotation to be shown.
We're clearly down to minor tweaks at this stage, so we're close. PS is still working on a couple of cosmetic issues. I'm thinking that there should be some more sophisticated diagnostics to catch broken links; I don't think that check is currently finding links that point to an element in a document which is not one of the ref docs.
PS is working on the styling of the results page, and fixing a bug in scrolling of marginal page-numbers in normalized documents; I've fixed some other bits and pieces related to search, parameterized the build process so that I can easily build a full eXist XAR (1.4GB) locally without making Jenkins do it, and tested the big XAR on a local eXist (it works well). We're getting closer.
I think this was the last piece of the puzzle for the Mariage eXist app. I haven't yet tested building the complete webapp; I'll do that soon. Meanwhile, there's one issue regarding the display of the gravures that I'm working with PS on.