I've been working on generating JSON files in support of the search, mostly with the aim of supporting the search filters, as we do in other projects like Keats. The work is going well, but what I'm not really clear about yet is where to draw the line between convenience and file size; for example, I can include details of the periodical title and folder path with every poem retrieved under any category, or I could include only an id and have a small JSON file for periodical information which I use to look that info up when needed. Still thinking about all of this.
Today's meeting was productive, and following it I implemented a process for reporting on the main rhyme-scheme of a poem and variant stanzas that don't follow it. This threw up a couple of issues which I fixed with Schematron and cleanup. I've also re-worked the way the page-image pops up when you mouse over it, and parameterized the URLs used in building poems so that on Jenkins we should (if all goes well) get proper relative links to other site pages, whereas on the local build for encoders, you get fixed links to the Jenkins build. 360 minutes.
In our usual weekly meeting we talked about a few things including the common font-style: italics typo, which I added a Schematron rule for; the asterisk line encoding pattern, which was hastily conceived by me and basically silly, so I've replaced it with a saner and more extensible approach, with changed rendering; and various issues with rhyme. I also fixed a bunch of first-line issues in the db, and re-worked the diagnostics which discovers those problems so that it doesn't trigger on anything like as many false positives; and I rebuilt the TEI for Once a Week, which has lots of new content. 210 minutes.
Macs can't display the black triangle Unicode characters, so I switched to plus and minus signs for the TOC.
I've now finished and documented the rhyme-finding tool, and the team are testing it. Meanwhile, there's a need to be able to nimbly merge some components of the metadata db into a small subset of the TEI files -- specifically, a single year for a single periodical -- to allow indexing fixes and updates to be propagated on a folder-by-folder basis so the encoding can proceed without running the whole massive operation. I've therefore modularized that process, and it can now be called with parameters for periodical folder and year, and tested the result successfully with Chambers 1840, which is next on the encoding list. I did the same thing to the OCR process, which usually needs to be run after the db merge process anyway. This will make life easier going forward. 240 minutes.
Had to meld together four long poems into one, as a result of an indexing error that had to be corrected. In the process, I worked out a way to use the What Rhymes With functionality to find candidate existing tagged rhymes inside the poem you're currently working on, which should help speed up the rhyme labelling for longer poems. I'll show the team tomorrow. 240 minutes.
Tagged some echo figures in 1840 Chartist, and in the process refined the detection algorithm a bit to cope with full-stanza echoes.
Met with K & K and talked through our process regarding imperfect rhymes; wrote up the result in our documentation. Got PS to help with debugging a particular rendering issue with inverted line wraps, then documented the method of encoding those in the schema. Fixed a couple of bugs reported by KAF, and started thinking through a process that might help encoders detect when rhymes in a long poem are echoes of rhymes earlier on.