Did some scanning for CC. Took longer than it should, because Photoshop crashed, and we had to install the GIMP to get good results from the scanner.
LCC started work today, so I spent some time getting her going with the Varin transcription (no XML yet). In anticipation of this, I reorganized the folder structure on the server so that all the documents which the team is actively editing are under one directory, and thus easy to back up (previously they were all over the place, some in personal directories). I also went through the three editing machines and removed or updated links and project files to point to the correct locations, so although the old files are still there, each person should now work on the new copies without necessarily being aware of it, and I'll be able to back up new work easily.
With Delphi 2009 due to arrive soon, it's time to start planning the feature set and data structures for IMT 2.0. The overall plan is to make the application much more flexible; it needs to handle documents with multiple images, and to allow much freedom in associating any "annotation" div and any zone on any image. These are my initial ideas:
- An IMT document is rooted on
<TEI>
, and contains:- A
<teiHeader>
- A
<facsimile>
element, which contains multiple<surface>
elements, with each<surface>
containing multiple<zone>
elements, and each<surface>
element containing one<graphic>
element linking to an image file - A
<text>
element which contains a single<div>
, which in turn contains a<div>
for each annotation or transcription block. Each of these<div>
s is associated (through@facs
or@corresp
) with one or more of the<zone>
elements.
- A
- Each image/
<surface>
is presented as a separate tab in the GUI, so you can move from image to image by clicking on the tabs. - The annotations/transcriptions (we need a new word for this -- "zone-div"?) are presented in the current format, as a scrolling list, which can be filtered by category.
- Any zone-div can be associated with one or more
<zone>
elements, on one or more<surface>
s. In other words, you can create a single zone-div which links to (say) three<zone>
s on three different<surface>
s. This enables efficient re-use of zone-div data, where features on different<surface>
s need the same explanatory information. This breaks the current one-to-one correspondence between an annotation<div>
and a<zone>
, in IMT 1.8. - Associations between
<zone>
s and zone-divs will be handled by giving each zone-div object a list in which to store links to<zone>
elements (as opposed to the single pointer that currently exists). The zone-div parent list will have methods for finding associations by interrogating its children, in order to discover which zone-divs link to any given<zone>
, or which<zone>
elements are linked from any given zone-div. - The above change will pose problems for the user interface. At present, when you click on a
<zone>
on the image, the associated annotation can be highlighted and displayed in the annotation window, because there is no ambiguity -- only one associated annotation exists. The same is true in reverse: when you click on an annotation in the annotation window, the associated<zone>
can be highlighted. In the new system, not only can one zone-div be associated with more than one<zone>
, but one<zone>
can also be associated with more than one zone-div, so it's much more complicated to provide an efficient and helpful interface. This might be handled in the following way:- If you click on a zone on the image, and only one zone-div is associated with it, that zone-div will immediately be selected in the annotation window (retaining the simplicity of the current interface where the data structure allows it).
- Similarly, if you click on a zone-div (annotation) in the annotation window, and that zone-div is associated with only one zone on only one surface, then that surface (image) will be shown, and that zone highlighted.
- However, if you click on a
<zone>
with which multiple zone-divs are associated, a popup menu will appear, listing those zone-divs (by their tag-stripped titles), so that you can select one of them. - Similarly, if you select a zone-div in the annotation window which is associated with multiple
<zone>
elements, a popup menu will allow you to select which of them should be highlighted (listing them by image filename and coordinates).
- This takes care of navigating existing data, but a much more complicate problem arises with regard to creating such links in the first place. Since zone-divs and
<zone>
s are now only loosely coupled, we need to consider how a user might want or need to go about adding new<zone>
s and zone-divs. These are some of the operations the GUI will have to enable (while, I hope, remaining simple and intuitive):- Adding a new
<zone>
and zone-div together, as currently; both are created at the same time, and are automatically linked. This should probably be the default behaviour, since most people will probably create most of their markup in this way. - Adding a new
<zone>
, but associating it with one or more existing zone-divs. Perhaps this would beControl + Add Zone
, and would pop up a scrolling checkbox list of zone-divs in a modal dialog; if you select one or more and press OK, the<zone>
is added and linked, but if not, the<zone>
creation is aborted. When a<zone>
is successfully created, the first zone-div of those associated with it would be selected. - Adding a new zone-div, but associating it with an existing
<zone>
. This might best be done using a right-click on the<zone>
itself, and/or a right-click on the zone-div editing area. Another possibility is to have a drop-down list in the zone-div editing area, which has an entry for each associated<zone>
, along with buttons to add and delete associations. Selecting an item in this list would foreground the<zone>
. - Adding an association between an existing
<zone>
and an existing zone-div. This should probably also be done using a right-click on the<zone>
element, and should also be available in the zone-div editing area. - Deleting an association between a
<zone>
and a zone-div. This could easily be done in the zone-div editing area, but it would be harder to do it through a right-click menu on the<zone>
element, so perhaps this should be limited to the zone-div editing area. - Deleting a
<zone>
element. This would require pruning of all associations between zone-divs and the deleted<zone>
(and might result in orphan zone-divs -- see below). The operation should also probably ask the user whether any zone-div which is ONLY associated with the<zone>
which is to be deleted should also be deleted. - Deleting a zone-div. This is simpler than the above, because the associations are all stored in the zone-div object; but as above, it might result in orphan
<zone>
elements. This operation should also ask the user whether to delete any<zone>
elements which are only associated with this zone-div. - Handling of orphans. If you can delete associations, then you can easily delete all associations between zone-divs and
<zone>
s, so any given<zone>
may be left without any associated<div>
, and vice versa. This is not a problem at all, in fact, but it may be something users should be warned about at the point where they create the orphan(s). The software as a whole needs to handle orphans of both kinds without problems. - Question: would it ever be necessary to allow the addition of a zone-div without an associated
<zone>
? It would be possible to create such a thing by adding a normal pair, then deleting the<zone>
, but do we need to allow for direct creation of unassociated zone-divs?
- Adding a new
One more thing we need to think about is the question of metadata, both at the document level and at the level of <surface>
. Where there are multiple images, each will perhaps require distinguishing data at the level of the surface tag, and we'll need to provide a good interface for that. At the same time, as the overall complexity of the document increases, people will probably want to add more complex information in the header. I don't know how we could make that easier, without actually creating an XML editor for it, so perhaps we can leave that as it is.
The most difficult aspect of the rewrite will be retaining, as far as possible, the simplicity of the current interface, so that existing users have little trouble adjusting, and are able to continue to work on simple, single-image documents in the way they're used to working, while making all the new options easy to access.
Finally, validation: do we want to think about that? It would be extremely hard to implement, especially with RNG schemas, but if it could be done, and errors could be retrieved and shown correctly in the GUI, it would be quite useful.
Assisting with /maintaining Lansdowne Speaker incoming paperwork.
Organizing/confirming demos for upcoming Humanities lab sessions
In at 7.30, keeping the office open till 4.30 as I'm the only one here.
I'm in the process of building a system whereby editors can go to a specific page on the site, and from it, view all the possible output formats for every document in the db; they can also generate "hard" copies of each of these formats and store them on the server as a permanent record (in case the db or Cocoon become unavailable).
I made the following progress today:
- A sitemap pipeline is in place to use the Directory Generator to create a list of the documents already in the backup directory (which is
backups/{styleGuide}
). - Another pipeline, called by
backups.xml?styleGuide={styleGuide}
, generates a TEI file which lists all the documents in the database byxml:id
; it also has an XInclude instruction in it which is then processed, and that pulls in the directory listing from the above pipeline, creating an XML file which has a list of the contribution documents, along with a list of the details of all the "hard backup" copies that already exist. - Another pipeline,
backups.xhtml?styleGuide={styleGuide}
, processes that through an XSL transformation, which is currently only in skeleton form but will eventually create a rich page with links to view and generate any or all of the hard backups available.
So I guess I'm about a third of the way through this; the next stage will be generating individual items on demand, and the final stage will be generation of multiple items (all the PDFs, say, or all the XHTML files), along with a decent GUI reflecting progress and completion.
Wanted to purge any Windows desktops of old versions of IMT now that the Mariage project has been migrated to 1.8, so I booted up the two remaining ones, removed 1.7 and installed 1.8. I did updates on the Vista PC, and on the XP machine I discovered that SP3 had not been installed, so I let that do its thing; then there was another round of updates after that. I also booted up the Ubuntu desktop and let it update itself -- for some reason it can only do that from the command line. I figured I might as well do the rest of the public workstations, so I booted the two Macs and did all the updates on those too, including an excruciatingly slow Adobe update on the dual-monitor Mac. All five of those workstations are now updated.
The changes to IMT file structure also broke the search functionality. I had to make changes to find.xq
and search_results.xsl
to accommodate the new @corresp
and the hash in @facs
and @corresp
values. That also now seems to be working OK.