The changes to IMT file structure also broke the search functionality. I had to make changes to find.xq
and search_results.xsl
to accommodate the new @corresp
and the hash in @facs
and @corresp
values. That also now seems to be working OK.
The transition from version 1.7 to 1.8 in the Image Markup Tool has introduced some new features and changes which needed to be migrated into the Mariage project files. This is the procedure I implemented to make the changes:
- Data files (IMT XML files) were converted over to the 1.8 format. This was done with the IMT's built-in XSLT conversion file, but I modified it with some special overrides as below.
- The result of this conversion would still have the wrong IMT version number in it, so that was updated by a modification to the XSLT conversion.
- The conversion also results in a file which has inadequate distinction between transcriptional and non-transcriptional annotations. This problem arises in two locations in the file: first, in the
<tagsDecl>
area where the transcription categories are stored, theTranscription
category had to be designated as "Transcriptional" by the addition of a<desc>
tag inside the<rendition>
tag. This was also easy to do through a modification to the XSLT. - The second problem component of the file is the linking, which by default ends up using
@facs
; all cases which are not transcriptional should be using@corresp
instead. Another modification to the XSLT from IMT accomplished this conversion. - This gave me valid IMT 1.8 data files. The problem now was that the XSLT on the site was expecting the old format files, so I had to modify
imt_p5_to_xhtml
. Any code that makes use of the@facs
attribute value had to be rewritten so that it could use both@corresp
and@facs
, and was sensitive to the new presence of the hash at the beginnings of those values. - Having converted all the data and the XSLT, I tested the XSLT by first uploading one new data file, and confirming that it failed to convert using the old XSLT; then uploading the new XSLT, and confirming that it worked with the new data file but failed with the old ones; and then by uploading all the new data files and checking a sample of them. Everything seems to work fine, including links into individual annotations.
- Finally, I archived the old data files, and overwrote copies on the server in the mariage account with the new versions.
Everything seems to have gone smoothly. This is a tricky kind of task, especially on a live site, so it was good to do it when the office was empty and there were no distractions.
Fruitful project meeting to see where the transcription tasks have reached, and to plan our strategy for next semester. Decisions were mainly made for the two tag-teams, working on the Sonnet and the Amboise texts. The plan is basically that they'll finish up (any time soon), then they'll get together and examine each other's markup to look for inconsistencies, then we'll revise markup to incorporate all the best of both approaches. Then, the Sonnet team will begin tagging line equivalences, based on the 1609 text. The Amboise team will start adding <note>
elements containing questions they'd like to see answered, along with possible suggested answers if they know them; these will form the basis of scholarly annotations. We need to categorize these notes by topic, using a type
attribute. The other team will do the same, once they've tagged their line number equivalences.
Set up EM's workspace so she's editing directly on the server, and she worked through an example from a previous document to create a new header and frontispiece. Unfortunately, her many pages of previous transcription appear to have vanished; the file I backed up had no content in it other than the first line, and was timed at three hours before she finished on the 15th. It looks as though saving failed. However, she managed to get 18 pages transcribed again, and we now have solid backups of that data. I pushed it into the db to get some idea of how it would look; it's OK, but with such short lines, the left margin really ought to be moved over towards the centre a bit. I need to have a think about how pagebreaks are encoded, and what the default margins should be for prose text.
Spent some time getting the new RAs booked onto machines and working. EM is busy transcribing the second volume of the text that TG is working on; we haven't started on the XML with her text yet. AC is working on the Varin text; there we have a previous XML file started by CC, but not complete (it's missing authorial annotations, which are actually more extensive than the poetry), but the latest transcription doc file seems to be complete, so AC is moving blocks of transcription from the doc file into the XML file, and marking them up.
Things not yet done:
- Both of them are still working on local copies of files, until their accounts are added to the Mariage group. Have to remember to back up their stuff, and when the permissions are done, get their oXygen project files pointing at the files on the server.
- EM still needs to be started off with XML; that should be done next Wednesday, when she overlaps with TG, so that they can both learn together.
- AC and I need to meet with CC to figure out detailed plans for the Varin markup. For instance: all the poetry appears to be in italics except for some instances of plain, so we need an efficient approach to the markup to handle this, without using
<hi rend="italic">
in every line. One approach is to use<lg rend="italic">
and then<hi rend="plain">
where non-italic bits appear.
Got TG set up on Arugula, initially transcribing the first half of one of EC's roman texts. Then CC brought down two more RAs who will start work next week, and we organized them. A will work on the Varin, and E on the second half of the roman.
The two latest images from the BN have been languishing ignored for a month, but I've now got around to dealing with them. I processed the image files to clean them up and create the three formats, then created new IMT files for them. To do this, I had to install the old 1.7 version of IMT on my machine, and avoid the current 1.8 development version. Then I did the actual transcription of the text, for a couple of reasons: to get something into the system so they can be added to the site, and to give myself more IMT experience ahead of finishing the new version off. Then I put the results into the db and checked them on the site. Waiting for any corrections to the transcriptions from CC.
TG will start work as a workstudy, digitizing EC's prose novellas, and doing some image markup.
LSPW and I spent some time setting up the XML document for her 1621 Sonnet de Courval task; we also had a discussion with CC about transcription policies, which gave rise to these decisions:
- Typographical ligatures will be ignored, because only some are available in Unicode, and they're actually stylistic rather than substantive.
- Ampersands will be kept (i.e. not expanded to "et") because they're easy and not a problem for the reader.
- Long s will be transcribed as long s, both in ligatures and where standalone.
Set up LW with access to the network folder, and added a range of obscure typographic characters to her favourites on the Character Palette. She's now working on transcription. If there's time, on Monday, I'll get the actual document set up with her, so that she can do transcription half the time and markup the rest.