Encoded a few more pages. Moving steadily along.
Encoded the whole of the actual translation, mainly using s/r in gedit, because it's so well structured. Seems to be rendering well. Did a few more pages of the intro, and confirmed that there are no conventional endnotes; the Notes section is keyed to the stanza numbers.
For the new translation we definitely need the
<caesura> element, so I have:
- Manually added the element to the RNG file which I'm now maintaining as the P4 schema.
- Added it to the content model of the
- Added handling for it in the output to XHTML5 and PDF, and tested.
- Added handling for it in the P5 output, and tested.
I've set up the framework file for the new translation, which is huge, and done a bunch of search-and-replace by style in an ODT version to cut down on the encoding work. We already have one font problem: one of the headers (which are bold) includes the title of a work (which is italic), and the bold/italic font that we fall back to is impoverished with regard to Unicode character ranges, so the o with ogonek is lost. I've had to add an extra tag around that title where it appears in these contexts (
<hi rend="normal">) to unset the bold for that specific text run. I think I've had to do this before.
NVD sent proofing corrections for her review, and it turned out I'd built it before adding the final para. That's now done. Then set up the master file for Volume 24, as well as two empty placeholders for the editorial intros, and built the result to see how long it is (168 pages). It's still missing one translation, so it'll be more substantial in the end.
Minor editorial corrections for Lingard translation done; new review by NVD encoded. Due to the small size of the site, my current publication method is simply to rebuild the entire webapp and push it up to eXist.
I used jira to create XHTML versions of the OCRed volumes, and integrated the results into the existing search; it now seems to work pretty well, so I pushed it to the live jetty.
The headers for the translation itself were appearing at the bottom of a page in the PDF. I set up a special case of
div0[@type='NewPage'] which the XSLT now handles to force a page-break in this case. Republished the XAR.
I've extracted the editorial content from the master volumes, and I'm now building a separate index page for each of the volumes we've created, including the editorial stuff. On the other listings pages, volume numbers are now links to the volume pages. This answers (I think) the last of HT's requirements for the new site, other than a redesign.
I've also tested Apache Tika with the old volume PDFs, and the results are very promising; I think we may be able to process them to ugly XHTML, which eXist could then index, and provide people with search capabilities and linking out to the specific page in which the hit is found.