The new translation is done, thanks to numerous hacky shortcuts to converting ODT to TEI, including search-and-replace on the contents.xml file. The results highlighted a couple of minor layout annoyances, so I've also fixed those. The document comes in at about 100 pages. It's now posted for proofing.
Received a new translation and started working on it. I've used the LibreOffice macro search tool to good effect, enabling me to add some tagging in a semi-automated way to the word-processor doc using styles, and I'm now transferring that content into the XML document. The metadata is done, the bibliography is done, and I'm working through the editorial intro.
Simple redirect XQuery module that handles both the PDFs and the HTMLs now implemented.
Worked a lot on the search today, constraining it to published documents only, and tweaking how it returns results. I've also added VNU validation of the HTML to the build process, and fixed some problems arising out of that; I've turned popup notes into <aside>
elements so that their inline text can be ignored by the indexer, while the footnote rendering at the bottom of the document will be indexed; and I've refined the indexing after using the monex profiler. I also tweaked the P5 output so that it validates with the correct schema links. I think we're more or less there now; what we have is already much better than what's on the site, and I see no problem issues at all.
I'm now happy with the way everything is working. There are tweaks I could make -- I need to put in place redirects for the old URLs, and I should revisit the collection.xconf and indexing, and there will be more pages that need to be created for the new site -- but it's all basically there, and what we have now could replace the current webapp immediately. Will start that process next week.
I've added an eXist app build to the process and I'm now testing and bugfixing locally to bring this project thoroughly into the Enddings fold. Everything is basically working, but I'm finding I now want to enhance some aspects of the rendering and display so that it's a cleaner and simpler setup than the previous app; I'm adding citation stuff in a footer, as well as getting the search working, based on Mariage. Should all be done soon.
The new static build process is now complete, including all the DC header stuff, the teiHeader display widget, and the search page (which of course won't do anything until it's in the context of a webapp). With HT's approval, I have now switched all the old URLs over to the new on the existing site, and added 301 redirects through the sitemap.xmap. See the relevant stanzas there to see how it's done; I doubt we'll ever need to do this sort of thing again with such an old Cocoon, but it took me a while to figure out. I still need to add validation to the build process.
I've given up trying to figure out how XEP is working on Peach (or why changes to its configuration don't work), and although I'll keep implementing those changes, I've now moved over to a system where PDFs are pre-built and uploaded to Peach, and just serialized from the sitemap. I've also put new URLs in place alongside the old, so you can go to scancan.net/arthur_1_24.htm and get the HTML page, as well as pdf/arthur_1_24.pdf to get the PDF and similarly for the XML. I'm waiting to hear from HT about whether we should switch over to using these URLs in the site as a whole, and send out 301s for the old ones. Had to re-learn some Cocoon foo to do all this, but it looks like it might be quite straightforward to manage a transition to the new URL structure on the old site, and run it for a good while that way, then roll out a new Jetty/eXist as with other Endings projects when we're ready.
I've done all the corrections to the Arthur translation, including:
- Removing the numbering
- Spacing out the separate sections
- Moving the notes from the English side to the original text
- Two textual corrections the translator asked for
and the PDF is now all good from the point of view of character display.
Some unusual Unicode chars were not rendering properly in the new translation. I discovered that they were not available in the old Gentium fonts we were using, but they were supported in the latest 2014 versions, so I downloaded those and reconfigured xep.xml to get them working locally. When I went to do the same on Peach, I discovered that although it appears that XEP is running in cocoon-legacy/resources/xep, and that's certainly where it's looking for its fonts (which I proved by moving some temporarily), adding the new fonts and changing xep.xml there doesn't solve the problem. It seems that XEP must actually be running somewhere else. There is a version at /home1t/tapor/xep, and that might be the one whose xep.xml needs to be updated, but that file is not group-writable and we don't have the login for the tapor user. I've asked sysadmin to make it gw so I can experiment with changing that one. But the oddest thing is that both the xep batch files on the server are pointing at Java 1.6, which is not there, so either there's yet another installation, or it's being run directly by the Jar file somehow. We need to get away from this dependency, perhaps even sooner than I had planned.
Also did some XSLT work to allow for side-by-side translations that don't need to have their rows numbered, which is the case for the new text, and fixed a couple of other things. More to do there.