The build was broken because the wonderful NVU validator was finding errors in the HTML5 output that originated in bad attribute values in the original XML (@xml:lang and @target values mainly). I've now tracked down all of those and fixed them, and also caught some other problems with head elements inside lists. The build is now working again, and I'm now writing XSLT to generate a really stripped-down version of the website, by transforming the existing output, designed for rapid crawling by the archive-it crawler.
I began with the assumption that I could just pull in every linked reference into the XML of a specific document, and thus create complete coherent docs; but this is not the best approach, because references link to each other recursively in this project; one might very easily end up with the entire reference collection embedded in many documents. Therefore I've created an indexing system that builds an index to the 6,000+ items (references, biblio items and tile images) that are not explicitly linked because they're accessed through JS; and I've included an invisible link to that file in the footer of most of the front pages of the site. CD's crawler is now working on this version of the site, and we'll see if it does the job or not.
The results from the crawl by CP are answer my initial two questions very clearly:
The AJAX fails because the crawler doesn't know from JavaScript and doesn't attempt to retrieve the fragments.
The zoomable images fail because ditto.
Also, I notice that the crawler inserts its own JavaScript (/wb-static/js/ait-client-rewrite.js and //five.partner.archive-it.org/static/AIT_Analytics.js). We'll have to check that this doesn't actually break existing JS on the pages. It does appear to interfere in the operation of the browser's history object; this is not likely to be a problem for this project, but GN is in the middle of developing something which makes active use of recent ECMAScript functionality for manipulating history; this is definitely something to be aware of.
I'll have to go back to my rendering pipeline and make everything a standalone complete file, with all its references embedded. Might as well do that in the XML, prior to generating the HTML.
Fixed all 131 instances where divs had been interrupted by labels. This had to be done manually, since the contexts were all so diverse that no XSLT or regex approach could be made to work. I have over 300 to do in Le Bon Mariage. :-(
Started work on cleaning up and clarifying the layout of primary source texts in the static rendering output. There's quite a bit more to do, but I have something that looks cleaner and more distinct from the rest of the site already. I also found a number of encoding problems in some texts and fixed them; and one more issue prevalent in Le Bon Mariage and Le Forest Nuptiale, which is isolated by this XPath:
//label[@type='marginal'][following-sibling::*[self::div or self::p][matches(., '^\s*[\-a-z]')]]
These are instances where the encoder has erroneously closed an para and its div before inserting a marginal label, then re-opened div and para, in the middle of a sentence. These need to be collapsed; the label should appear inline. I've confirmed that making this change will not affect the rendering on the current live site, but it seems impossible to fix this with a regex, and the XSLT needed to do will be a mite thorny. Will need careful testing.
Fixed a bunch of oddities in xml:id values in the image markup files; added to the build process and output to enable links to the XML source of each document.
Much progress with the image wrapper. Popups are working; layout is right; the overview control is in place; zone selection on the image and in the nav panel are now integrated; zone selection on the panel zooms the image; zone selection on the image now expands and focuses the panel item; and selected zones now have their own colours in their outlines. The only core programming thing left is to split out the "showing" status into "showing" and "mousedover". After that there are issues such as abstracting captions and so on.
I now have a nav panel for the categories and zones, and after much wrestling I discovered how to use bind() to assign a method of an instance object to a dom element with a contextual parameter. Some progress with layout too, with a bit of help from GN (use of vh for image height is perfect).
This is bringing my more advanced JS skills up to date, so very useful. Closures and especially the use of bind() will be extremely handy in future.
I now have the wrapper reading both zones and categories from JSON or from an object, and I have styles being created appropriately. I've also revisited previous approaches to having zones show up when you mouse over them, and come up with a better system which causes much less activity on mousemoves and doesn't cause flickering. This is a big step forward. Clicking now also appropriately selects a zone. Next I think I'll elaborate the style system so that instead of creating styles, it uses functions, so I can handle text size and line widths based on resolution (if necessary); and after that we have to approach the issue of constructing a category/zone navigation box.
A couple of lessons learned: it's important to pass the correct "this" into callback functions; even if they're being called from the context of a method in your object, they're not going to know their context unless you pass "this" as the (optional) "this" parameter to the callback.
Continued work on a project I started at the weekend: creating a relatively simple object-based wrapper library to encapsulate the functionality we typically want from our OpenLayers Zoomify images. The idea is that you should just be able to make a couple of calls to the API and get your working Zoomify image with zones and basic functionality out of the box; you could then extend it using the prototype approach.
It's also partly a project for me to get more familiar with modern JavaScript practices and work out which of the many approaches to defining and instantiating objects (with namespacing) are most effective and appropriate for our work; and how we might handle (for instance) supporting a both a traditional constructor and a constructor which parses object literal notation at the same time. It's going well so far.