...for our DHSI course. Steady progress. 120 minutes.
SK and I determined that most of the Python diagnostics are no longer running, and one of the two that were running is obsolete so it's now disabled. The one remaining one is very flawed, but it's better than nothing, pending re-implementation in XSLT. 60 minutes.
Today I prepped and ran the changes required to replace the old echo figures taxonomy with the new sonic devices taxonomy.
I also set up a new interface which enables any encoder to refresh the metadata in their XML file from the most recent repo copy of the SQL db; in the process, I found and fixed more minor bugs in the generateSvnChanges.sh script. Also, 1860 Once A Week has now been refreshed and OCRed, and other odd years have been refreshed as part of my testing.
Finally, I did a bit more work on the Porter2 stemmer.
480 minutes.
(Posting hours from yesterday). Met with GL to catch up and to discuss future plans. Plotted out some actions and then sent out a few questions regarding the TINA_NNM.
Time: 90 min
Late duty.
Did some work in the new OJS interface, and worked out some oddities there with UBlock Origin. Prepped the Measures article and passed it to AT for encoding. 120 minutes.
It's a bit slow because of lots of footnotes with links to page refs in other articles. 90 minutes.
There is now a Hashtags table in the database, where you can define new hashtags and edit existing ones. Each hashtag record has a hashtag, a gloss (short explanation) and a description (long explanation). I've populated these based on my understanding of the current set of hashtags.
The diagnostics now make use of this table instead of a hard-coded list of tags, so you can add a new tag and start using it immediately; the diagnostics may take a day or so to catch up with the change, so expect to see a few errors until the changed info from the db has made it into the diagnostics process.
The schema build process now also takes advantage of this info:
- The db table data is incorporated automatically as a taxonomy element in the taxonomies.xml file.
- The taxonomies from that file are incorporated into the ODD file to provide the list of acceptable values for catRef/@target attributes.
Eventually (not happening yet), the TEI poem files, when their metadata is refreshed from the db, will have catRef elements added to them for any hashtags appearing in them.
- Also eventually, the web view of a poem will list any hashtags with their glosses somewhere in the metadata panel.
I added a new Schematron rule that catches the problems with eye-, half- and identical rhymes. It instantly found over 500 errors in Blackwoods 1820 alone.
So instead of putting it directly into the schema, I've made a special file:
utiltities/tempSchematron.sch
Any of us can right-click on a folder in the xml tree, choose "Validate / Validate with Schema", then choose that schema and find all those specific errors to fix a few when there's time. Meanwhile our files are still valid against the main schema.
I've fixed the Keepsake 1840 and Chartist 1840, since those were all my own errors. Having seen a few of them, I don't think there's any reliable way of fixing them mechanically, unfortunately; there's a fair range of different ways we've encoded this stuff.
240 minutes
This is the last one to be encoded. I've done some automated stuff in the ODT document and encoded the references; just the article and footnotes to do. 90 minutes.
Late duty.