Digital Victorian Poetry Project

December 4, 2019

Fixes for diagnostics; fixes for OCR

Posted by on 04 Dec 2019 in Activity log

Today I've added a couple of new features and fixed an annoying bug in the diagnostics; at this point, I'm not aware of anything more that's broken or missing there.

I've also had to rewrite some of the OCR build process, because it turns out that recently-OCRed stuff was coming out slightly borked -- one word per line, instead of nicely lineated. The problem turns out to be caused by some change in the way that Tesseract works; it seems to be producing a wider range of line-like span classes, some of which I've never seen before, classifying some poetic lines as captions, and others as callouts; and it's also now producing indented XHTML at the line level, adding extra returns. It took a little tweaking to get it fixed, and I'll have to watch it a bit. I've added a control parameter to the OCR process that enables you to overwrite any existing OCR in a file; normally we don't want to do that, because we may be OCRing a collection just because a couple of new items have been added to it, and we don't want to have to re-do all the others, but in cases where the process went wrong in some way, it's just what we need. 180 minutes.

November 27, 2019

Minor tweaks and cleanup

Posted by on 27 Nov 2019 in Activity log

Made a few changes to the way the build process happens and what the output pages look like:

Poem files are now built as a single operation, which means a single xsl:key is instantiated once instead of many times,
cutting a few minutes off the process.
In poems with no transcription, the page-images are now shown at readable size by default.
In notes fields, the use of asterisks from the db is converted into html i elements so it looks cleaner.

120 minutes.

November 20, 2019

Knocking items off the TODO list

Posted by on 20 Nov 2019 in Activity log

I worked through a few things on the growing list today:

Added series/vol/num/page info to the listings of poems.
Figured out, implemented and documented an encoding and rendering strategy for drop-caps, which are very gnarly indeed.
Added a periodical listings page and put it on the menu.
Added new personography items for new team members, reworked the code for long and short bios, and contacted old and new team members to get bios.
Increased the scale height for encoded lines in the graphs.
Other minor tweaks.

Full day, 420 minutes.

November 18, 2019

Changes to db structure

Posted by on 18 Nov 2019 in Activity log

After consultation with AC, and some data cleanup, I've removed some old fields that were no longer used:

Text
Illustrations
OLD Illustrator
Links
Page-image Notes

as well as the old poem_search table.

ALTER TABLE `poems` DROP COLUMN `po_text`;
ALTER TABLE `poems` DROP COLUMN `po_illustrations`;
ALTER TABLE `poems` DROP COLUMN `po_illustrator`;
ALTER TABLE `poems` DROP COLUMN `po_links`;
ALTER TABLE `poems` DROP COLUMN `po_imageNotes`;

DROP TABLE `poem_search`;

120 minutes.

November 13, 2019

More English 500 prep

Posted by on 13 Nov 2019 in Activity log

I've now finished a rewrite of the worksheets, and the package is ready for testing on Monday. 180 minutes.

November 6, 2019

Working on the English 500 package, and other stuff

Posted by on 06 Nov 2019 in Activity log

Spent most of the day reworking the materials we had for English 500, because the project has changed so much since we last taught it, and the classtime has been collapsed to a single session. I still have some worksheets to write, but the build process is working, the demo materials are there, and the first two worksheets are done.

Also had project meetings and discussions on indexing issues and possible solutions; we will probably add a new poem field, "Attributions proofed", to the database, to handle situations in which RAs have made changes to people in the db, which will require that the attributions to the poem record be proofed again.

360 minutes.

October 23, 2019

SP's first encoding

Posted by on 23 Oct 2019 in Activity log

Worked through the entire encoding process with SP and she finished her first poem. 200 minutes.

October 22, 2019

DB updates and building periodical/year index pages

Posted by on 22 Oct 2019 in Activity log

Added the new "Proofed" column to the db, and wrote the required handling for it in the db code and the diagnostics; I've also gone through AC's Google Doc record of checked runs and scripted the setting of the value. Also spent some time working on generating poem index pages per periodical-year, which will be very useful for browsing purposes; I'm considering an AJAX-based browse interface that can use either the HTML or JSON versions of these pages to allow drill-down discovery. 240 minutes.

October 16, 2019

Work done from September 11 ff

Posted by on 16 Oct 2019 in Activity log

This just records the work done while I was travelling, and since I got back.

The build now has HTML and CSS validation built into it, so it fails if there are errors or typos in e.g. CSS. This also required me to make the documentation file valid XHTML5, which is done, and I had to fix a few dozen CSS errors that have built up over the years.
The static search has now been built into the site. It's drawing by default from the master branch of the endings/staticSearch github repo, which is intended to be pretty stable, but I'm also using DVPP as the main large-scale testing platform for the development of staticSearch, so when built with the dev branch it has newer features, usually broken. I'm currently working on getting date filters functional.
At AC's request, I've added a new table to the diagnostics, tracking the encoding and proofing of poems by decade-year and periodical; this should help the RAs figure out what to work on next.
One TEI-generation process was run remotely while I was away, demonstrating that this works well.
Two presentations relating to DVPP were made at TEI 2019.

June 27, 2019

Refreshed and encoded TEI 2019 submission

Posted by on 27 Jun 2019 in Activity log

We were accepted, with some minor requests for changes from reviewers. After consulting with the team, I've made those changes, and encoded the article in jTEI. The others will proof it. 90 minutes.