SQL-to-TEI conversion now updated to take account of the changes to the db; ditto with schema and documentation; and finally the poem rendering XSLT.
Did the updates on the dev db first, then on the live db. Problems encountered were that extensions to field lengths in the poems table seem to have hit MySQL limits, in particular the size limit on a row, which is 65,535 bytes. Converting some columns to TEXT instead of VARCHAR solves the problem, although of course there's a performance penalty. I also had to delete some indexes which were hitting limits. Below is the process in half-code-half-comments. Now I have to update my TEI generation code to take account of the changes.
/* This file is the working SQL file for changes to the db made in December 2018, per * instructions from AC. * * Make these changes step by step and confirm/check/test/backup before continuing. */ /* FIRST THE SIMPLE THINGS: MAKING TEXT FIELDS LONGER. */ /* Pseudonym field needs triple the length of characters. */ /* First we have to drop some indexes this field is involved in. */ ALTER TABLE `poems` DROP INDEX `idx_po_general`; ALTER TABLE `poems` DROP INDEX `idx_po_pseudonym`; /* Now set the length. */ ALTER TABLE `poems` MODIFY `po_pseudonym` VARCHAR(300); /* Display name ditto. */ ALTER TABLE `persons` MODIFY `prs_displayName` VARCHAR(300); /* Images field needs to handle up to 70 images. This involves changing its type to TEXT. */ ALTER TABLE `poems` MODIFY `po_images` TEXT(4096); /* Add a new allonym text field. */ ALTER TABLE `poems` ADD COLUMN `po_allonym` VARCHAR(300) AFTER `po_allonymous`; /* Add new hashtag field. */ ALTER TABLE `poems` ADD COLUMN `po_hashtags` TEXT(1024) AFTER `po_links`; /* SERIES FIELD FOR POEMS. */ ALTER TABLE `poems` ADD COLUMN `po_series` int(11) default NULL AFTER `po_organ`; /* NOW UPDATE local_classes.php and test. */ /* local_classes.php: * set prs_displayName to 300 length. * set po_pseudonym to 300 length. * set po_images to 4096 length. * add new allonym field. * add new hashtags field. * add new series field. * * */ /* NOW THE HARD STUFF: TURN THE TRANSLATOR FIELD INTO A ONE-TO-MANY LINK TO PERSONS. */ /* First create the linking table. */ DROP TABLE IF EXISTS `poems_to_translators`; CREATE TABLE `poems_to_translators` ( `ptt_id` int(11) NOT NULL auto_increment, `ptt_po_id` int(11) default NULL, `ptt_tr_id` int(11) default NULL, PRIMARY KEY (`ptt_id`), KEY `fk_ptt_translator` (`ptt_tr_id`), KEY `fk_ptt_poem` (`ptt_po_id`), CONSTRAINT `fk_ptt_translator` FOREIGN KEY (`ptt_tr_id`) REFERENCES `persons` (`prs_id`) ON DELETE CASCADE ON UPDATE CASCADE, CONSTRAINT `fk_ptt_poem` FOREIGN KEY (`ptt_po_id`) REFERENCES `poems` (`po_id`) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; /* NOW UPDATE local_classes.php and test. */ /* Next, we try to dicsover candidates for translators in the persons table. */ /* This is the XQuery to generate the SQL: */ --------------------------------------- declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization"; declare option output:method "text"; let $poemsWithTranslators := //table_data[@name='poems']/row[string-length(field[@name='po_translator']) gt 0], $links := for $p in $poemsWithTranslators let $transName := normalize-space($p/field[@name='po_translator']/text()), $candidates := //table_data[@name='persons']/row[field[@name='prs_displayName'] = $transName] return if (count($candidates) = 1) then let $poId := $p/field[@name='po_id'], $prsId := $candidates/field[@name='prs_id'] return concat( 'INSERT INTO `poems_to_translators` (`ptt_po_id`, `ptt_tr_id`) VALUES ("', $poId, '", "', $prsId, '");', ' ') else (concat('/* No match found for ', $transName, '. */ ')) return $links --------------------------------------- /* Run the resulting SQL against the db to insert * the new records.*/ /* Change the local_classes.php file to show "OLD Translator field". */ /* Download fresh versions of the db and commit. */ /* Run XPath against the db to get comma-separated lists of poem ids where: * a) a new record has been inserted linking to the poem table, and * b) no match was found so a record will have to be manually created. */
Tomorrow is update day for the db, so I've built a full plan with SQL and XQuery code ready to execute. It looks like we can link about 730 of the 1530 or so translators directly to existing person records, so although those will need to be checked, that's a lot faster than doing them all manually. The remaining ones will have to be handled manually, though. I've also pulled the banner from the old VPN site ready to make a rough home page for the site, and done some more thinking about a more sophisticated CSS parser in XSLT.
It's very common to find the same pattern of indents throughout the stanzas of a poem. Right now, people are encoding these mechanically and repetitively, which is OK but clutters the XML and takes time. A better option would be to use the TEI rendition element with the @selector attribute, like this:
<rendition selector="lg"> margin-left: 6rem; </rendition> <rendition selector="lg>l:nth-child(2), lg>l:nth-child(4), lg>l:nth-child(7), lg>l:nth-child(10)"> margin-left: 1.1em; </rendition>
to specify that all stanzas have a left margin of 6rem, and lines 2, 4, 7, and 10 of each stanza are additionally indented.
This is easy to code but hard to process. I've had a first shot at figuring out how to do it, and so far so good, although as the selectors get more gnarly the code will have to be revisited. It's good enough for testing purposes at any rate.
The progress tracking output was borked in a couple of ways, one cosmetic (the chart display had hundreds of stacked labels on the X axis) and one arithmetical (I was miscalculating the projected duration based on current progress). I've fixed both of those issues.
Worked on my poem-encoding Quick Fix so that it can now tag a whole poem in one go. As part of developing and testing, I also encoded a couple of poems myself, and did some tweaks to rendering and processing. I also ran the OCR task against the 1840 poems to give myself a bit more choice in picking test poems. Updated the documentation as well.
Met with KF and AC and discussed a number of issues, as a result of which I've added schema support, processing support and documentation for handling refrains, eliminated the hack that was used to handle them before, tightened up the rhyme label attribute constraints as a result, and fixed the old encoding approach from the data. I've done the same for ornamental horizontal lines. In the process I encoded a couple of poems myself, and fixed some bugs in rendering that were annoying me.
More work on this, and lots of work to fix bad rhyme encoding which is now obvious in the results. Much tedious re-encoding of old poems. Fixed some XSLT bugs too.
I've been meaning to do this for a long time, and I got the time today to do a quick-and-dirty implementation of a search for all endings that rhyme with a given ending supplied as a param. The results are intriguing, suggesting many encoding errors in the older files. These can all be fixed, of course, but it'll be interesting to follow up on how some of them happened. More work to do, too, on the interface to the feature.
Added the Schematron and the QuickFix, documented them, and then trawled through all the existing documents to fix problems (there were hundreds). This applies only to the text element descendants for now, but I also fixed some apostrophes/straight quotes in the db itself to avoid future problems.