The diagnostics is now showing lots of errors in filenames and links, so I've handled the bulk of them in a semi-automated way because they were amenable to that. We're now down to 64 bad filenames and 4 bad links.
I've made the updates to the database, by rsyncing most of the dev version over the live codebase (keeping a backup).
The changes are:
- There's now a "First Line" field in the Poems table, where you can add the first line of a poem which otherwise would be indistinguishable from others with similar titles ("Song", "Sonnet" etc.).
- The "Related poems" field now uses a combination of the Title, the First line (if there is one), and the id of a poem to give you something which (I hope) is findable and unique for each poem. When poems start with punctuation (quotes or whatever) they'll sort to the beginning, so you usually need to scroll past the quoted titles to get to the regular ones.
- There is now a publicly-accessible read-only view of the database here. I haven't linked this from anywhere or publicized it yet; I think the plan is to hide the notes field in this view, which will take a bit of work.
One note: there's an additional field in the poems table which just amalgamates all the text values of the other major fields, to make for a simple search field, which was used for the WordPress plugin. Since that's no longer working, and it was going to be difficult to combine the trigger I needed for the new First line field with the original trigger that also ran on update on that table, I've removed the original trigger. That means the plugin will no longer work, so if we need it to do so at some point, we'll need to figure out how to combine those two triggers in a single operation.
We now have diagnostics giving stats, and providing tests for ill-formed image filenames on the filesystem, and database records that point at images that aren't there. This is a good start, and there's plenty for people to get their teeth into.
KB came in to get SVN and Oxygen installed, and is now working on edits. Initial setup and basic process documentation is good, but I also need to do more in the way of documenting markup practice as we move to more complicated edits.
Inserted one of the poem page images as requested by KB; this works as expected, so I can go ahead with the same model for the rest of them. Did some other fixes from the list of TODOS while I was in there.
The project needs some of the improvements to the adaptive db added after this project started, such as the read-only interface, so I've copied the current data over into the dev db, checked out the latest trunk from svn, and hacked away till everything worked. The process found some oddities in the adaptive db code, which I've copied back to the repo. I'm now in a position to add and alter some fields that need to be changed, doing that only in dev first.
61 of the original 2014 sample of 100 poems were marked up in the original repo, albeit a bit shakily; I've converted those, added them to the new repo, and fixed all the validation problems.
This morning we had the intro session for the RAs, and they'll be starting tomorrow. I've now begun work on the diagnostics which will track our progress, as well as adding ordinary backups to the build file; the build file now has two combining targets, the do_all, which is what will happen on Jenkins, and the admin, which is what I'll run locally, to transform or generate things that need to be committed to svn or stored locally. I've also laboriously added the English 500 encoded XML poems to the repo organized by journal and year (it's clearer than using the variable vol/issue kind of organization), and I'll be doing the same with the original hundred poems we did a few years ago. Steady progress.
I've converted the old VPN ODD and associated files (XML and schema-building files) to DVPP files, and created both a schema build with Schematron extraction, and a general build process. I've set up a Jenkins job, got validation with RNG and Schematron working, and set up a cron job which puts the XML version of the db on home1t where the build process can retrieve it. Coming along nicely.
A couple of TODOs came in just when I needed a break from something else, so I polished them off.