Ben Jonson 2025-11-03 to 2025-11-10
to : Illya Nokhrin
Minutes: 660
Worked on XSLT conversion of modernized text editions from CEWBJ to LEMDO.
The main text of the play is now being converted and almost validating according to the LEMDO schema with revisionDesc @status set to prgGenerated.
The only issue preventing validation is that CEWBJ uses EPS files for images, whereas LEMDO only allows JPG/PNG files. I expect we will have to convert images to one of those formats once we have access to them.
Annotations files are also now converting fairly well. The main challenge here was dealing with quotation marks, which can be numerous and complexly-nested in ways that are not easy to parse, even with helper functions. For example:
Settling . . . fixing . . . subsiding ‘three related chemical terms. “Settle” was used of dregs and impurities separating out as scum or sediment, “fix” of the congealing of liquids or volatile spirits, and “subside” of the precipitation of a sediment. Lady Politic is especially proud of the last technicality, and the “as’twere” is a cue for admiration; it was probably a rare or new word, as it is not recorded by OED earlier than 1646’ (Creaser, Volp.).
The combination of numerous nested quotes and an apostrophe in “as’twere” mangles attempts to convert quotations programmatically. Realistically, it may have to be transformed/fixed manually but I will see if I can make the helper functions work better for cases like this.
The other issue is that CEWBJ has encoded their collation notes in a print/display-oriented format rather than an encoding-oriented format, meaning that lemmas and witnesses are not tagged. I have created some functions that work for processing relatively straightforward collation notes into LEMDO format. I will keep working on this, but it is likely that anything more complex will require manual transformation/encoding.