Log in

HCMC Journal

Ben Jonson 2025-12-08 to 2025-12-12

to : Martin Holmes
Minutes: 565

On Monday, started a proper plan for remediating the folder structure of the documents; wrote some XSLT to harvest the paginated listings into a single listing, but did the Blog section index manually because the order is important and there’s no way to infer it.

On Tuesday, started putting the plan into action: moved the Essays section from its nested position into the root, and fixed all the links. The same approach will be used for all the other sections and subsections, so that the chaos is mitigated. I also discovered that the character encoding of the original site is borked (or at least, the server is not serving it correctly), so there were UTF-8 borkitudes all over the texts; fixed all the ones I’ve identified so far.

On Wednesday, worked on the Masques and Literary Records and got those all working; fixed a couple of errors in the Essays. Gradually I’m getting through the reorganization. At the end of the day, I discovered that the largest listings page set, for the Performance Archive, was only partially retrieved by the crawl, because its pages don’t contain links to all the others in the set, so those pages will have to be retrieved again; started writing some code to do that.

On Thursday, wrote some Python to retrieve all the listings files for the Performance Archive from the site, made them well-formed, then wrote XSLT to build them into a single table. Then wrote a quick script to retrieve the complete set of 1412 performance files, remediated those, and then replaced my partial set with them. Moved the performance collection to the root as I’ve been doing with other things, then remediated remaining issues. Then moved on to fixing the last two or three problems with the Records site section, all of which is now working properly. What’s left is Bibliography, Chronology, and Music; after that I need to get the listings filters working; and finally we will have to consider what to do about the Works (perhaps these should be replaced with a message about their being remediated for LEMDO publication?).

On Friday, handled the bibliography and made more random link fixes as I came across them.