03/12/18

Permalink 05:08:16 pm, by mholmes, 31 words, 9 views   English (CA)
Categories: Activity log; Mins. worked: 180

Fallout from What Rhymes With...?

More work on this, and lots of work to fix bad rhyme encoding which is now obvious in the results. Much tedious re-encoding of old poems. Fixed some XSLT bugs too.

30/11/18

Permalink 03:46:27 pm, by mholmes, 80 words, 5 views   English (CA)
Categories: Activity log; Mins. worked: 60

What Rhymes With...? feature

I've been meaning to do this for a long time, and I got the time today to do a quick-and-dirty implementation of a search for all endings that rhyme with a given ending supplied as a param. The results are intriguing, suggesting many encoding errors in the older files. These can all be fixed, of course, but it'll be interesting to follow up on how some of them happened. More work to do, too, on the interface to the feature.

Permalink 03:44:38 pm, by mholmes, 48 words, 4 views   English (CA)
Categories: Activity log; Mins. worked: 120

Implemented no-straight-apostrophe rule

Added the Schematron and the QuickFix, documented them, and then trawled through all the existing documents to fix problems (there were hundreds). This applies only to the text element descendants for now, but I also fixed some apostrophes/straight quotes in the db itself to avoid future problems.

29/11/18

Permalink 04:22:58 pm, by mholmes, 176 words, 4 views   English (CA)
Categories: Activity log; Mins. worked: 180

ODD updates; more work on SQL to TEI

After adding some documentation on the SQF QuickFix features to the ODD file, I got annoyed by the fact that the SQF code in the ODD file was making it technically invalid. Actually, it was the TEI code embedded in XSLT embedded in SQF which was causing the problem, so I refactored it to use XSL element and attribute constructors instead of literal elements, and the problem was fixed. You have to be a little careful with namespaces when doing that, though.

Then I moved on to implementing the requirement for curly apostrophes. Actually I'm going to generalize that to curly quotes everywhere. Before we can make rules and enforce them, we need to make sure that we're not actually importing more of these things whenever we do SQL-to-TEI processing, so I've been working on those conversion routines to make them handle the curly apostrophes and quotes in the db. In the process, I learned a bit about XSpec and wrote my first XSpec unit tests. This looks like it may be a valuable testing tool.

28/11/18

Permalink 03:54:46 pm, by mholmes, 135 words, 5 views   English (CA)
Categories: Activity log; Mins. worked: 200

Better documentation, better poem rendering

Meeting and group tagging session, during which I did the following:

  • Created a link inside the documentation HTML for a "cheatsheet" which is actually a constrained view of some of the documentation, intended for printing.
  • Discussed with the RAs the need for a simpler way to check your rhyme encoding, which resulted in a new feature in the poem rendering that enables you to turn on and off individual rhyme label highlighting.
  • A new constraint on lg/@rhyme, which uses a regex to constrain the content, and includes a new value of "NONE", which we will add some processing for.
  • Fixed a bunch of old encodings which were no longer valid against the new constraint.
  • Discussed encoding of prose content before and inside poems, developed tagging guidelines for it, and added them to the documentation.

21/11/18

Permalink 04:58:12 pm, by mholmes, 26 words, 9 views   English (CA)
Categories: Activity log; Mins. worked: 200

Training and more progress with schema, documentation and quickfixes

Made a bit of progress before and after the morning's training; also tagged a longish poem myself as part of testing. All basically working as intended.

20/11/18

Permalink 03:52:24 pm, by mholmes, 81 words, 6 views   English (CA)
Categories: Activity log; Mins. worked: 360

Schematron quick-fixes for tagging

After a lot of reading and experimentation, I think I have a robust way to enable automated tagging of blocks of content, using Schematron Quick Fixes. Right now I have one for turning double-dashes into em dashes, and (more important, and more difficult) one for auto-tagging a block of text as a stanza. I've also added processing into the ODD file build to retrieve the code template keystroke shortcuts from the Oxygen file and build an explanatory table in the documentation.

16/11/18

Permalink 04:31:39 pm, by mholmes, 20 words, 11 views   English (CA)
Categories: Activity log; Mins. worked: 180

HOCR working and tested

It's now a one-line command to add OCR to any given year in the TEI files. Did both 1820 and 1830 already.

14/11/18

Permalink 04:59:35 pm, by mholmes, 45 words, 10 views   English (CA)
Categories: Activity log; Mins. worked: 120

HOCR process two-thirds done

My HOCR process is now able to find all the candidate poems in a year, download all the images, run HOCR on them, and then start to process the original file to include the comment in it. But the last phase is a little tricky.

Permalink 04:57:18 pm, by mholmes, 59 words, 8 views   English (CA)
Categories: Activity log; Mins. worked: 120

Training day

Did another round of group training, where we all discussed a lot of our processes and tagging practices. We've decided to dispense with any encoding which can be derived from the structure or other tagging -- so for instance lg/@type is not needed right now, because all the simple types are inferrable, and the complex ones need expertise.

:: Next Page >>

Digital Victorian Poetry Project

This is a blog to track work on the DVPP project. Prior to this blog being created, posts were made in the Depts blog.

Reports

XML Feeds