Converted six old handouts from svn training and English 500 to use with our DVPP training, then we did the training. Afterwards, followed up with some fixes to the project file and transformation of poems, and also noticed that some encoded poem data was being lost to the obsoletes folder, so wrote a diagnostic to find the extent of that and generate info for fixing it. Will fix it next, and figure out why it's happening.
Added to and tweaked the documentation. Made a change to the diagnostics chart so it better reflects our progress towards 15,000 poems. Fixed some bugs. Rebuilt all the TEI.
I've basically completed the documentation in the ODD file ahead of Wednesday's training session, drawing on the old VPN materials but rewriting a lot to bring it in line with our current practice.
Met with AC and generated the following TODO list, some of which I've now done:
- Port over existing respStmts from old file (DONE).
- Reconfigure taxonomies to remove stanza types in favour of linegroup types (DONE).
- Reconfigure schema build process to incorporate full glosses and descs for linegroup types into schema (DONE).
- Reconfigure authorship taxonomy for better nesting and allonymy, and update existing headers accordingly (DONE).
- Add total poems per periodical to the stats (DONE).
- In the HTML poem rendering, replace links to images with thumbnails (DONE).
- In SQL to TEI process, incorporate authorship taxonomy data as catRefs.
- Add automated OCR to TEI building process for specific years (starting with 1820).
Also did a fresh rebuild of the TEI, incorporating new poems since the last one, to confirm that respStmt handling works correctly.
Ported over the original code from the VPN project and updated and tweaked it quite a lot to get better layout options. There's still the main layout to do, and I'll use CSS grid for that. In the process of today's work, I found errors in lots of poem encodings, which I fixed; added Schematron rules to prevent some of them; added missing bits to the taxonomies; and various other updates and fixes. This is all good progress.
There will be some tweaks that we need to handle, but the basic process is complete and we now have over 10,000 TEI files in our repository. Next is working on the Oxygen configuration for encoding. Posting time spent today, including a long phone call with AC, but also spent at the weekend getting the last wrinkles ironed out.
Spent most of the day tidying up, finishing off, and dealing with edge cases; I'm now able to generate all the TEI files containing all of the information we care about (minus the quoted-in-article thing for now, and the original language of a translation), but I'm just struggling with the final two tasks: generate scripts for moving obsolete files out of the xml folder, and for svn-adding new files which didn't exist before. This is just a question of getting my head round sed, which I don't use very often. Many changes to the schema and documentation in this process too.
For ease of reading and convenience, I've added a section to the documentation which is a rendering of the taxonomies.
I'm now able to correctly re-generate the personography and bibliography data from the database, creating a valid output file. I'm also pulling in page-image data into the facsimile element of the poem files. Steady progress.
We need to do three kinds of things when processing the canonical metadata database into TEI XML: 1. update metadata in existing files, 2. create new files for poems never before processed, and 3. detect problems such as multiple existing files for the same id. I'm a good way into this, and the basic structure is set up and appears to be working. Ultimately, I think we'll have to manage the whole operation with ant, to construct all the output stuff in a new location, then copy it back over the original tree, because we won't be able to both read and write the same file in one XSLT transformation.
Also reconstructed the two required triggers in the live database to get the related-poems dropdown working again.