Today I've added a couple of new features and fixed an annoying bug in the diagnostics; at this point, I'm not aware of anything more that's broken or missing there.
I've also had to rewrite some of the OCR build process, because it turns out that recently-OCRed stuff was coming out slightly borked -- one word per line, instead of nicely lineated. The problem turns out to be caused by some change in the way that Tesseract works; it seems to be producing a wider range of line-like span classes, some of which I've never seen before, classifying some poetic lines as captions, and others as callouts; and it's also now producing indented XHTML at the line level, adding extra returns. It took a little tweaking to get it fixed, and I'll have to watch it a bit. I've added a control parameter to the OCR process that enables you to overwrite any existing OCR in a file; normally we don't want to do that, because we may be OCRing a collection just because a couple of new items have been added to it, and we don't want to have to re-do all the others, but in cases where the process went wrong in some way, it's just what we need. 180 minutes.