The UVic events calendar has not presenter field, so that piece of information can't be extracted easily. The dean wants to be able to do exactly that. I wrote Dave W and he said they'll include the request for consideration for the next release of the service, whenever that is.
In the meantime, I've written code that looks for ".." preceding and following zero or more characters in the description field and identifies that as the presenter's name. The code also looks for the first instance of the tag strong class="presenterName" in the description field. I had to use the strong tag because Dave's code strips span tags from the description field.
For the purposes of exhaustive documentation, this is the process for linking in new image scans to existing documents.
- First, process the images into the appropriate sizes and format (JPEG), and make sure they're all named correctly. This is documented elsewhere on the blog.
- At the command line, go into the folder where your new images are (you'll probably have to repeat this step multiple times if you have different folders). In this case, the set were all in co_60_03.
- Run this on the command line to get a list of all the filenames in a text file:
dir /a-h /b >list.txt. - Open that list in a text editor, and copy/paste it into Transformer. You need to turn it into a list of TEI
<graphic>elements. - Run the Transformer sequence called
scan_file_list_to_xml.seq.xml. You'll find this in the root of theColonial Despatchesfolder in my documents. - Add the resulting list of new graphics files to the XML file
scan_images.xml. This file sits inxml/scans/. - Upload that file to the database.
- Now you need to add links to the images into the existing documents. First, go to the MS Images page on the site to ensure that the new images appear as they should. Look at those images until you find one which should be linked to a transcription document, but doesn't yet have a link. This will be your test document for the success of the linking operation.
- Upload copies of all the correspondence documents that may be linked to these images into the
testcollection in the database. This is where we'll run the linking XQuery, so that we don't inadvertently damage the main document set. - In eXist's Webstart client, bring up the query window, and copy-paste the query from
adding_page_scan_links.xquery(which is in the root of the "Colonial Despatches" folder). - Run the query, and check out the results. You will probably see many examples of documents not linked, but there will be many
<done>elements signifying links successfully added. Save these results to a file in case you want to go through and analyse them later. - Now backup the whole database (from
/coldesp/down) to your local computer. This is a useful think to do anyway. - In oXygen, go into the
/test/folder in the new backup, and validate all the documents with thecoldesp.rngschema. - If they all validate, things are looking good. Now open your test document, which previously wasn't linked to any page image, and see if a link has been added. You should find a tag that looks like this, in the
teiHeader/fileDesc/sourceDesc/bibl:<biblScope type="startPageImage" facs="co_60_03/co_60_03_00112r.jpg"> co_60_03_00112r.jpg </biblScope>
This indicates that the linking was successful. - Now copy the newly-changed documents from the code>/test/
subfolder of your new backup over the original documents in your main collection on your hard drive, replacing the older documents with the newly-changed ones. - changed files into the database on the server.
- Now check that the image link appears on your test transcription document, as it should, and that it links to the correct image.
- After this, the amount of checking and verification you do is going to depend on how much time you have. You can work through all the despatches that SHOULD now be linked, looking for any that aren't; and you could also work through the output from the XQuery operation, checking each instance where something was ignored to see if it should not have been.
That's what I've done this morning to add in the 1300-odd new images we have from the last CO_60_03 film.
Left a bit early, for an appointment.
No time to blog in detail, but the D table is up...
Posting time spent on fire drill.
Another user of the site wrote to request access to a hidden document. I think these requests will continue and perhaps grow, and it may not be a bad thing (it means people are using the site, and are aware that there's work that needs doing on it). I'm thinking that it might be a good idea to customize the page people see when they try to access such a document, so that it provides a link to JL's contact info, along with the URL of the page in copy/pastable format, so that he can easily decide what they need to do.
Another option is to provide a simple method JL can use to provide access to a page, without providing access to everything. I have an idea for that, but it might be complicated.
More helpful user feedback; the bug I'd found yesterday turned out to be not quite as I'd thought, so I did another quick fix.
Stayed late working on CityStats table layout and some apparent anomalies in calculations.