Log in

HCMC Journal

LEMDO 2024-07-22 to 2024-07-26

to : Martin Holmes
Minutes: 1020

On Monday, got a fresh version of the EMDP database, which is substantially bigger. Generated XML from that, and then re-did all the character-encoding fixes we did last week on the old one. Found some very weird stuff which I wrote to ask HC about.

We also got access to the XML from the Ben Jonson collection, and I did a preliminary analysis of it. There is a lot to deal with.

On Tuesday, set up the Windows laptop so I can start debugging the schema build process on Windows, and started some methodical search-and-replace operations on the EMDP DB dump XML.

On Wednesday, made huge progress with the EMDP material, converting about half a million of the pseudo-tags and other markup devices to a temporary XML format, which is then converted to valid TEI. There are lots of remaining issues, and some are impossible to solve completely because of overlapping hierarchies, but I can handle what’s left using milestones, and then try to implement a milestone-to-container phase later.

On Thursday, added some stuff to the LEMDO schema to cover new things being used in EMDP, and then started validating the TEI output against the LEMDO schema. Starting from several thousand errors, I’ve been whittling them down, finding and fixing inconsistencies or overlaps in the original source, and we’re getting very close to being ready for remediation work.

On Friday, remediated many more of the square-bracketed tags, and met with JJ and NH to decide on how much further to go before leaving the remaining remediation to the RAs.