JT has rewritten the JSON and search page generation code so that filter fieldsets on the search page now have ids which match their JSON files; that means the JS can trickle-retrieve them after startup, allowing for the normal sequence to be superceded if a search is initiated during the process. I have the JS retrieving them successfully; now I just have to tweak the search-preparation code so it can get specific ones more quickly if required, and then tweak the search execution code to use the new split-out filters instead of the old docs.json. After that, we can dispense with docs.json. 120 minutes.
I've now captured the three additional manuscripts we talked about at Tuesday's meeting, and added pb elements to all the manuscripts ready for transcription, so we now have eight manuscripts ready to go, comprising over 2,000 pages; that should keep us going.
I've also added keystroke shortcuts for all the Wendat characters to the Oxygen project, and they're documented in the ODD file and generated HTML.
I've used a pattern based on Shift + Alt + key, or (on Mac) Shift + Option + key; we'll have to see if that works for people (it'll certainly work on the student workstations in HCMC).
I've left French out of the equation for now because the simplest option might just be to have the transcribers use a French keyboard, which they'll presumably be familiar with -- and if they're not, it's more valuable for them to learn a standard French keyboard layout than a made-up system that works only on one project.
I still have a lot of work to do on the TEI encoding for the entries; I've decided I'm not happy with the gramGrp approach, so I'm thinking again. But there's lots of transcription work we can be doing anyway.
Including Tuesday's meeting and trial encoding, 400 minutes.
Did the reviews assigned to me for DH 2020 and finished proofing SB's article, sending my fix list to the team. 120 minutes.
Tomorrow we'll be encoding some sample content, so I've tweaked the page-image view a bit to make it friendlier for that purpose, and outlined a couple of pages-worth of questions we'll need to attend to, which we should be able to answer through the process of encoding. Armed with those answers, I should be able to nail down the schema fairly tightly, and also work on the keystroke shortcuts and the documentation. 180 minutes.
Fixed a bug in the Oxygen plugin, and raised a ticket on the tei_customization.odd schema. Also proofed half of an article for Issue 12 of the journal. 120 minutes.
Find all running queries:
SELECT pid, datname, usename, query, state, now() - pg_stat_activity.query_start AS duration FROM pg_stat_activity;<br /><br />
Refresh a materialized view:
REFRESH MATERIALIZED VIEW schema_name.view_name; GRANT SELECT ON ALL SEQUENCES IN SCHEMA app TO user_name; GRANT SELECT ON ALL TABLES IN SCHEMA app TO user_name;
Remember to grant select to whatever read-only users need access or they won't be able to read this view after the refresh.
Today I've added a couple of new features and fixed an annoying bug in the diagnostics; at this point, I'm not aware of anything more that's broken or missing there.
I've also had to rewrite some of the OCR build process, because it turns out that recently-OCRed stuff was coming out slightly borked -- one word per line, instead of nicely lineated. The problem turns out to be caused by some change in the way that Tesseract works; it seems to be producing a wider range of line-like span classes, some of which I've never seen before, classifying some poetic lines as captions, and others as callouts; and it's also now producing indented XHTML at the line level, adding extra returns. It took a little tweaking to get it fixed, and I'll have to watch it a bit. I've added a control parameter to the OCR process that enables you to overwrite any existing OCR in a file; normally we don't want to do that, because we may be OCRing a collection just because a couple of new items have been added to it, and we don't want to have to re-do all the others, but in cases where the process went wrong in some way, it's just what we need. 180 minutes.