As the year wraps up, I thought I would jot a quick note as we head into holidays.
I continue to add pb tags to the 1859 files. Along the way, I have discovered that we need to process a few more image batches, as follows:
CO 60/5
CO 60/6
CO 398/1
The above contain a sum of roughly 120 despatches from 1859
I will ask Theo to work on this in the new year. Between the two of us, we should make short work of it!
I've re-focused on the task of generating and storing the OAI records in the database, in such a way that they can be updated easily whenever the db contents change. I've written a library called oai_update.xq, which has the original record-generating code from my first attempt, but massaged a bit so that it uses explicit namespace prefixes for TEI; this is necessary because we need to generate the record fragments in no namespace, so it's easier if we don't have a default one. I also fixed a couple of bugs which emerged when I tested my code on the whole 7000+ documents. This is what it does:
This is what it's not yet doing:
As I write this, I'm generating a set of OAI records for the whole up-to-date collection on my local copy of the machine. In the new year, I should be able to dump those and upload them into the live db to pre-populate it. Then I can add the feature above, and then write sitemap pipelines for the operations and add them to my set of periodic update operation tasks. Finally, I can then finish the OAI interface, which should be much simpler, since it'susing existing records instead of querying source data and constructing records.
Reminder to self: the OAI docs are here.
Note to self: this is a simple, tested way of storing a document in the db:
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace xdb="http://exist-db.org/xquery/xmldb";
declare namespace util="http://exist-db.org/xquery/util";
let $doc := <doc><test>My test doc</test></doc>,
$coll := collection('/db/coldesp/oai/records/')
return xdb:store($coll, 'test.xml', $doc)
This will delete a document:
return xdb:remove('/db/coldesp/oai/records/', 'test.xml')
This snippet will delete a document if it exists, then replace it:
let $remove :=
if (fn:doc('/db/coldesp/oai/records/test.xml')) then
xdb:remove('/db/coldesp/oai/records/', 'test.xml')
else (),
$doc := <doc><test>My test doc 2</test></doc>,
$coll := collection('/db/coldesp/oai/records/')
return
xdb:store($coll, 'test.xml', $doc)
Today I finished the implementation of the GetRecord response, which is very substantial indeed. I then started working on ListIdentifiers, and got to the point where I was able to start testing the execution time of some queries. The results demonstrate that it's going to be entirely impractical to generate this data on-the-fly. We're going to have to generate it in advance and store it, in the OAI record format, and then run the OAI queries against that collection. So this is what I'm now planning to do:
oai in the database.meta and one called records.meta, store a document called sets.xml, which contains the entire ListSets response.meta, store a document called identify.xml, which contains the entire Identify response.metadataFormats.xml, which contains the entire ListMetadataFormats response.records, store a generated full record for every correspondence document, possibly using the same xml:id attribute as on the original document.oai.xq file, and create a new library which does the following:
oai.xq library so that it handles all requests using the data in the oai collection.I've already implemented the three documents inside the meta collection, and simplified my oai.xq accordingly. Now I have to generate the records, before I can start working on the query interface. It's pretty certain I'll have to use the resumptionToken functionality -- I'll perhaps feed out records in sets of (max) 100. I'm going to encode the entire request in the resumptionToken so that I don't have to cache the query or results; that'll be simpler, and will obviate the need to periodically clear out the data from the cache.
I've started on the update script, and I'm going use code like this example to store the documents.
This won't be finished until next year.
Progress so far:
ListSets response, which has quite a range of sets.GetRecord, which calls out to a getDocRecord() function that does the real work.getDocRecord() function; it can return a header (fulfilling the needs of ListRecords), and I'm just getting started on the Dublin Core metadata output in the <metadata> tag.ListMetadataFormats (easy, because we're only supporting oai_dc).One thing I'm currently undecided on is whether I should bother with the resumptionToken functionality. If I do, that means I'll have to cache the parameters of the request in the db somewhere and retrieve them in response to the token, which is a bit of a pain; I'm more inclined to let the whole thing run, and only worry about the resumption token if it seems likely that the results will be too large to handle.
A further 983 page-images have been added to the manuscript image browser, covering British North America correspondence from 1859 (Individuals). Transcriptions are now being linked into these images. CO 6 vol 30 is close to completion too.
I've started implementing some back-end XQuery to respond to requests from OAI-PMH harvesters, according to the specifications and guidelines here. I'm intending to implement the baseURL as bcgenesis.uvic.ca/oai.xq, and handle all requests through a single XQuery library, which I've begun writing. So far I've implemented verb checking, return of passed arguments in the request element, and the UTC response date-time. Most of the time so far has been spent wading through the spec, which is predictably meticulously unilluminating, but there are examples, and it looks straightforward. My projected implementation of identifiers is going to look like this: oai:bcgenesis.uvic.ca:B63030SP.scx.xml, where the last component has the @xml:id attribute of any XML file in the database, and the final suffix dictates the format (XML or XHTML) of the resource, so we can treat XHTML and XML versions of the data as separate items. I think this makes sense, although my plans may change through the process of implementation.
I'm proposing to use sets for document type, year, and possibly others, with a hierarchy of type:year; this also may change. I'm hoping this won't take too long to implement, given that we're already spitting out pretty comprehensive Dublin Core for all the transcription documents, but handling the personography and other modern data may be more problematic.
CP and I spent some time this morning trying to figure out the history of incoming despatches and their attachments, in an effort to figure out what microfilms we need to order and digitize next year. We have pieced together a likely scenario that explains some of what we're seeing:
A further 817 page-images have been added to the manuscript image browser, covering British North America correspondence from 1858 (Individuals, N-Z). Transcriptions are now being linked into these images.
The stats on Megapode seem to have been backfilled with stats from previous times, going back to February 2009, so I've now retrieved the six stat blocks I'm tracking for both 2009 and 2010 up to Nov 30. The aim is to have one file for each year, but just in case stats end up getting lost, as has happened in the past, I download the year-so-far stats at the end of every month.
A further 814 page-images have been added to the manuscript image browser, covering British North America correspondence from 1858 (Individuals, F-M). Transcriptions are now being linked into these images.
Long way to go yet, but that's already more than one abbreviation per document in the collection, on average.
Another relatively straightforward one. Regexp:
(?<!abbr>)([Rr])(ec?<hi rend="[^"]+super+[^"]+">d\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1eceived</expan></choice>
Very common, especially in the name of the HBC. Preceding components of that name display remarkable variation, so most of those will probably end up being done manually. Regexps:
(?<!abbr>)([C])([o]?<hi rend="[^"]+super+[^"]+">y\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1ompany</expan></choice>
Done with regexps:
(?<!abbr>)([Dd])(esp<hi rend="[^"]+super+[^"]+">h</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1espatch</expan></choice>
This is common, and I suspect there may be other variants I'll catch later on. This is the regexp:
(?<!abbr>)([Ww])(<hi rend="[^"]+super+[^"]+">h\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1hich</expan></choice>
Two strategies for this: for the superscript one, regexp, and for the simple one just a case-sensitive search and replace:
(?<!abbr>)([A])(tt<hi rend="[^"]+super+[^"]+">y\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1ttorney</expan></choice>
Att. <choice><abbr>Att.</abbr><expan>Attorney</expan></choice>
Using this regexp:
(?<!abbr>)([Gg])(en<hi rend="[^"]+super+[^"]+">l\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1eneral</expan></choice>
Used these regexps to mark up instances of "Secy" and "Secty" for "Secretary":
(?<!abbr>)([Ss])(ec[t]?<hi rend="[^"]+super+[^"]+">y\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1ecretary</expan></choice>
A handful of new documents has come in from JH, based on transcriptions from MF B-3006 through B-3008 (CO 6/19 through CO 6/23). These MFs have not been processed yet, so I'd like to wait on doing the markup until they have, but I took a look through the documents and found a couple of anomalies, one being mislabelled 6/25, and a name misspelled. The name issue caused me to find a similar error in one of our existing transcriptions, B585HB06, which I've now corrected.
Just for the record: I'm backing up all the processed images from /home1t/coldesp/www/jpg_scans/ to Rutabaga, by ssh-ing into nfs.tapor.uvic.ca and running:
rsync --stats --recursive --times --delete --verbose -e ssh jpg_scans/ "mholmes@rutabaga.hcmc.uvic.ca:/home/mholmes/backups/Martin/Colonial\ Despatches/www/jpg_scans/"
This needs to be run periodically to keep the backup up to date. Of course home1t is backed up as a matter of course, but these images represent such an investment of time that I'd like to keep multiple copies. I'm also going to look at making a Rutabaga backup of /home1t/coldesp/archive/, which contains all the original image sets from which these were generated.
Meeting with CP and KSW about grant report due now, and future plans.
Spent an hour putting together some background and some ideas regarding the grant application, in the form of an email to KSW, NB and RP. I won't put it up here since it'll need a lot of work yet, but it's a reasonable start on the part of the grant application that will fall to us.
Added PB tags to 17 files from 1858, that required CO 6/27 images to do so.
This is volume 3 of the 1858 material, covering British North America 1858 offices: General and Individuals A-E. There are 1421 new page-images.
Met with MP, who may be interested in doing some work on the Despatches maps, if he's not snapped up by the MoM project.
The following is a list of maps that we wish to add to the despatches-site map collection:
Eleven files updated (listed in KSW's post).
This is part of the back-tracking work required for the files missing their corresponding images at the time of their original upload and proof. The following files were updated:
V585AD12.xml
V585FO02_A.xml
V585AD13.xml
V585AD14.xml
V585AD04_A.xml
V585AD08.xml
V585AD01_A.xml
V585AD09.xml
V585AD10.xml
V585AD11.xml
V475HB02.xml
A further 1,123 page-images have been added to the manuscript image browser, covering British North America correspondence from 1858. Transcriptions are now being linked into these images and the CO 6 25 set.
Took the latest stats from megapode (Urchin 6). They're still distorted by the Intermapper hits, so GN and I discussed using another "canary" -- possibly a web app just for that purpose -- to monitor our Tomcats, rather than bcgenesis. But the stats are rather a mess anyway because of the gap caused by moving servers (Sept 1 - Oct 4) and the fact that the old stats were Urchin 5 and these are Urchin 6. Very annoying. I don't think it's worth putting the work into integrating them and cleaning out the Intermapper hits unless somebody specifically asks for the stats.
Just a quick note to say that I am more than halfway through CO 6/26; the remainder should be completed early next week. Once this batch is complete, I will go back and connect some of the previous XMl files that had required this, and the Volume 25 collection, for their images.
1230 new page-images have been added to the site from Colonial Office Series 6 Vol 25 (British North America 1858 correspondence).
Running a beta of IE9, I noticed that the AJAX functions on the site (retrieving bios, places etc.) weren't working. It turned out that there's something badly wrong with IE's DOM2 support. If you ask it whether it supports document.importNode, it says yes; but if you actually call document.importNode, it responds with "No such interface supported". I was using document.importNode to insert XHTML retrieved with AJAX into the page.
The conventional wisdom is that, rather than trying to detect browser versions, you should test for function support; but when the browser decides to lie about its support for a function, you have to fall back on detecting the browser itself. So I've now added a test for MSIE in the userAgent string to find IE, and fall back to using innerHTML. This also works OK for IE8.
Following instructions yesterday from CT on the TEI list, I've implemented a system for serializing TEI XML using the new TEI mime type, with a selector which determines whether the browser can handle it or not before using it, or using text/xml instead. These are the sitemap details:
In <map:serializers>:
<!-- MDH: added this serializer to allow for UTF-8 XML output with TEI mime type. -->
<map:serializer mime-type="application/tei+xml" name="tei" src="org.apache.cocoon.serialization.XMLSerializer">
<encoding>UTF-8</encoding>
</map:serializer>
In <map:selectors>:
<map:selector name="accept-content-type"
src="org.apache.cocoon.selection.RegexpHeaderSelector">
<pattern name="tei">application/tei\+xml</pattern>
<header-name>accept</header-name>
</map:selector>
After <map:components>:
<map:resources>
<map:resource name="serialize-tei">
<map:select type="accept-content-type">
<map:when test="tei">
<map:serialize type="utf8tei"/>
</map:when>
<map:otherwise>
<map:serialize type="utf8xml"/>
</map:otherwise>
</map:select>
</map:resource>
</map:resources>
And in actual pipelines. use <map:call resource="serialize-tei"/> instead of <map:serialize type="utf8xml"/>. For example:
<map:match pattern="getDoc.xml">
<map:generate src="xq/doc.xq" type="xquery"/>
<map:transform type="saxon" src="xsl/highlight_matches.xsl">
<map:parameter name="browserURI" value="{request:requestURI}?{request:queryString}"/>
<map:parameter name="queryString" value="{request:queryString}"/>
</map:transform>
<!--<map:transform type="saxon" src="xsl/add_xml_stylesheet.xsl" />-->
<map:transform type="xinclude"/>
<map:transform type="session"/>
<map:transform type="encodeURL"/>
<map:call resource="serialize-tei"/>
</map:match>
For Firefox, you can set the browser to handle the TEI mime type in preference to the text/xml alternative by changing network.http.accept.default in about:config. This is my setting:
text/html;q=0.7,application/xhtml+xml;q=0.7,application/xml;q=0.7,text/xml;q=0.8,application/tei+xml;q=0.2,*/*;q=0.1
"q" settings are between 0 and 1, with higher priority for higher values, so here application/tei+xml is higher priority than text/xml.
To decide what happens to the mime type when the browser encounters it, you have to let the browser encounter it (unless you install the MIME Edit extension, which gives you actual control over mime type handling).
Other browsers are more problematic. This system works on the basis that the browser provides a prioritized list of acceptable mime types to the server (Cocoon), which can then serve application/tei+xml if the browser handles it. However, Chrome does not allow you to configure acceptable mime types, so it doesn't seem possible to make it announce to Cocoon that application/tei+xml is acceptable; therefore Cocoon will never deliver application/tei+xml to Chrome. If it were possible to deliver the correct mime type to Chrome, it would just hand it off to the OS or desktop (Gnome etc.) to deal with. Opera does allow you to configure what it will do with mime types, so you can set (for instance) application/tei+xml to open with oXygen instead of being displayed in the browser, but this does not appear to affect the list of mime types Opera sends to the server, so Cocoon is still delivering text/xml (as far as I know).
IE8 appears to do all file handling based on file extensions, which is not really helpful; there are hacks you can do in the registry, but it's not clear to me that they would succeed in achieving anything unless the file were delivered with a specific unique extension. IE9 beta is no different in this respect.
KSW found that some edited files from 1851 were further along in their editing process (addition of markup and pagebreak tags) than the ones in SVN, so we had to replace the SVN copies with the older ones; I then had to re-run the automated abbreviation markup regexps I'd (thankfully) documented carefully on the blog. We also found one file which had the wrong reference info (it was marked as CO 305 03 when it's actually in the War Office records), and another file which was a duplicate of the following file. All fixed.
Colonial Office volume 60/4 page images have now been processed and are available on the website. These cover BC despatches to London from 1859.
Had a discussion with four faculty members from the English dept, out of which we hope some projects relating to the Despatches will emerge.
Couldn't size SVG images reliably, so turned them into PNGs. Tested on laptop connected to projector via HDMI-to-DV, which works great.
JH wants copies of all the page-images on a 1TB hard drive he's written, so I'm copying them over. I had to reformat the drive, which came with some Windows software and an odd partition structure which made it unusable. I'll have to leave some of the copy operations going overnight, but I'm hoping they can be done by tomorrow.
Finished the presentation, then GN set up the projector, which now has cables attached and ready, and we tested it. It'll probably need some reconfiguration tomorrow morning when I bring in my laptop, because the SVG graphics aren't easy to control without using pixel sizes, but other than that we should be good to go tomorrow.
KSW and I have been working on a presentation for Wednesday, when we have some researchers coming who might be interested in projects relating to the Despatches. I've been working on it for most of the day. Still not sure exactly what to pitch and how to pitch it...
Added the markup for these abbreviations manually, because here seems to be a bug in oXygen's handling of the replace operation when referring to a captured group in the case when a lookahead positive assertion is made. There were only 64; I was able to find them with a regex search in oXygen, but had to mark them up manually.
This was the regex:
(?<!abbr>)([Hh])(on<hi rend="[^"]+super+[^"]+">ble\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1onourable</expan></choice>
Now looking at the variety of preceding "R", "Rt" etc. for "Right Honourable".
I've implemented the following regex replaces:
(?<!abbr>)([Ss])(h<hi rend="[^"]+super+[^"]+">d\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1hould</expan></choice>
(?<!abbr>)([Ww])(<hi rend="[^"]+super+[^"]+">d\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1ould</expan></choice>
For the record, the first one of these would find the following:
w<hi rend="vertical-align: super; font-size: 80%;">d</hi>
and mark it up like this:
<choice><abbr>w<hi rend="vertical-align: super; font-size: 80%;">d</hi>.</abbr><expan>would</expan></choice>
Changes are being uploaded to the database now. Committed a fresh revision to SVN after each operation.
The following are good to go:
(?<!abbr>)([Ss])(h<hi rend="[^"]+super+[^"]+">d\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1hould</expan></choice>
(?<!abbr>)([Ww])(<hi rend="[^"]+super+[^"]+">d\.?</hi>\.?) <choice><abbr>$1$2</abbr><expan>$1ould</expan></choice>
This is an expression for matching "wh", which as far as I can tell is always expanded to "which", but there are 1128 instances of it in the documents, so we should look at as many instances as possible to conclude that this is always consistent (and that it doesn't sometimes stand for "who", "whom", "what" etc.):
(?<!abbr>)([Ww])(<hi rend="[^"]+super+[^"]+">h\.?</hi>\.?)
I've marked up instances of abbreviations of "government" and "governor", and I'm uploading them into the database. I've also set up a system whereby a script makes a copy of the core correspondence files (currently sorted into folders by their respective years) into a single correspondence folder on my hd, so I can upload more conveniently into the db collection.
Met with the team to discuss some of JMH's concerns about the way our markup is being done. Made some modifications to the print stylesheet, which had got out of date and was showing images and irrelevant underlining.
JMH is concerned that we have accurately transcribed the abbreviations of common words such as "Govt" with superscripts, that appear all over the place. It has occurred to me that we could enable expansion of these quite easily using a search-and-replace, like this:
[Gg]ov<hi rend="[^"]+super+[^"]+">[^<]*t</hi>
which finds all the various abbreviations for "government" without finding those for "governor". We can do this, for a range of common abbreviations. I've already written some XSLT to make them into mouseovers as we do with the <choice>/<sic>/<corr> sets.
The above could be captured and used as a backreference with <abbr> wrapped around it, with the following replacement:
<choice><abbr>$0</abbr>
<expan>Government</expan></choice>
Initially I thought it might be simpler to replace instances beginning with a capital and those beginning with a lower-case letter separately, so that we can provide the accurate expansion in each case, but see below.
EDIT: I've refined the regex so that it won't operate on an instance that has already been processed, since we'll probably have to run it on files multiple times. This seems to be the best way to do it, using a negative lookbehind assertion, and capturing the first letter separately so we can reproduce capital or lower-case:
(?<!abbr>)([Gg])(ov<hi rend="[^"]+super+[^"]+">[^<]*t</hi>)
<choice><abbr>$1$2</abbr><expan>Government</expan></choice>
This seems to be working, but I'll need to do some more careful testing before setting it loose.
Colonial Office class 305 volume 13 page-images, covering Public Offices correspondence from 1859, are now available on the Colonial Despatches site. They can be accessed through the Manuscript images page.
The images in CO 60 / 4 [found in folder B-080], and possibly more, are photographed in reverse order. For example, page 1 actual is image number B-080-00788.jpg, page 2 actual is B-080-00787.jpg, page 3 is B-080-00786.jpg, and so on.
For the purposes of image processing, I downloaded the Volume 4 files and used KRename to rename them in a more sensible fashion.
Setting up a new Subversion repository so that multiple users can work on the same files at the same time. I didn't document this carefully enough last time, so I'll go through this in detail:
svn mkdir https://[path-to-repo]/trunksvn mkdir https://[path-to-repo]/trunk/xmlsvn checkout https://[path-to-repo]/trunk/xml .
svn stat | grep "^?" | awk '{print $2}' | xargs svn add
This command is suggested and nicely explained by this article.
svn commitCO 305, Volume 12 has been uploaded to the Manuscript Images page. This collection contains 1859 correspondence relevant to Vancouver Island, under the headings of Admiralty, Board of Trade, Council Office, Emigration Office, and Foreign Office. In the coming months, these images will be linked to their respective digital transcriptions.
CO 305, Volume 11 contains correspondence for the year 1859. You can access these images, along with all others posted to date, from our Manuscript Images page. In the coming months we will link these images to their respective digital transcriptions.
The title was wrong. Created my own, since the NAC metadata was the source of the error.
We have made available two separate RSS feeds for the Colonial Despatches site. One is for announcements only; it can be accessed from an icon in the footer of every page. The other is for all the project blog postings, and this is available on the Development page (click on About on the main menu).
CP wrote to point out that we had erroneously included a map from NSW in our collection, because the NAC had erroneously sent it to us. I've now removed it. Also, we had a wrong page ref in one of the "press coverage" items (now fixed), and one of our maps, FO925-1383, appears to have the wrong title information. I got the info from the spreadsheets supplied by the library, so the error originates there; I've asked CP if he can find out the correct information.
He also requested that we link to the library's map collection, but before I can do that, I need a list of mappings between metadata I have (such as the <idno type="libFileName">FO925-1383</idno>, and the collection and item numbers in the library's CONTENTDM system. That hasn't been available so far.
Scans from CO 305/11 are now accessible in the image browser.
I completed all the 1859 images in the CO 305/11 collection. I have processes the 800px and 60px images and informed Martin to this effect, so that he can upload them to the site. I will now move on to CO 305/12.
Met with KSW, JL, and CP to discuss the next phase. Discussed possible involvement of other faculty, possible grant sources, editorial policy, and the priority sequence for work in this phase.
Went through the IB grant application with KSW in preparation for tomorrow's meeting.
Processed hundreds of images for the Volume 11 collection. I predict that I will complete the processing of the remainder by next Monday afternoon. I am at 383 of 433 original images, presently.
KSW has re-processed the scans to get better quality images, and re-linked any 1859 files that were linking to them (only a few dozen). I've uploaded the new image files, amended the scan list in the db, and uploaded the changed documents.
Once a month I grab the stats for the preceding months, for grant reporting purposes.
There are two versions of the govlet site on hist66, one (the current one) in /BCCOR/, and the other (well obsolete) in /govLetter/. Google has indexed the latter, so I've added a redirect to each of the index pages in /en/ and /fr/, changing them from .html to .php. I initially tried to use a .htaccess redirect for the whole /govLetter/ directory, but this doesn't work -- perhaps .htaccess redirects are not allowed on unix.uvic.ca, or perhaps there was something wrong with my syntax.
Also wrote to JS and sysadmin to get the virtual host set up on Lettuce, and the domain name pointed at it in the UVic DNS.
Just in time for one of the Rutabaga drives to fail, so it'll be coming down this evening...
During development, the govlet.ca site was hosted on web.uvic.ca, in the hist66 account, but now development is complete, I've moved it over to home1t/govlet, which is its long-term home. The domain is currently pointing at the hist66 account; once we're all happy it's working properly in the new location, I'll get sysadmin to re-point it to home1t.
Found and fixed two errors, one an unescaped ampersand in a @title value, and the other a French menu included in one of the English pages. I haven't tested laboriously -- there are a lot of pages -- but I've fixed everything that I've seen so far. Waiting for JL's approval to ask for the domain change.
I've been working on sorting and grouping the First Nations names in a logical manner, and I've basically got the code working (on my local machine), except that the leading "the" is problematic -- sometimes it's "The", sometimes "the", and I'm basically convinced that it shouldn't be there at all. The tags in the documents need to be changed from:
<name type="fn">The [whatever]</name>
to
The <name type="fn">[whatever]</name>
for both upper- and lower-case versions of the article. I think I can do this with a simple search-and-replace.
One new article was reported to me, but searching myself, I was able to find two other new ones, and I was also able to find web links for many of the older articles, which had previously been unlinked. Also I was intrigued to find that we're cited in the Wikipedia article for Pelly. Updated the About page.
Also published the First Nations groups index page. I still need to do two things with this:
Here's brief summary of where we are at:
The places file is complete and stands at 184 entries, so too is the vessels file at 88 entries.
As for the other XML files, I have published this sheet that details our progress to date.
I shall return in early September to work on 1859.
We've just received the following new scans requested by CP (nearly 100GB in total). I've copied them locally, and to home1t/coldesp/archive, and will copy them also to Rutabaga tomorrow. In the process, I moved a few other bits and pieces into home1t/coldesp/archive, to tidy up the coldesp directory a little.
CO 60, 1859-1871, Vol. 4-44 and an ancillary reel
CO 305 1859-1866 from vol. 11 - 30
CO398 1858-1866 from Vol. 1 – 7
Long meeting at the library for discussion of a possible application to Canada Interactive Fund.
Downloaded all new material from the working account and updated the files in eXist.
Got the latest set of Coldesp stats from Urchin 5 (webstats.uvic.ca). Urchin 6 is available on http://megapode.comp.uvic.ca:9999, but it doesn't provide the same range of stats in the same formats as I've been downloading, so I'm sticking with 5 until I'm forced to change.
I continue to puzzle out the inconsistencies in the vessels files. Only a few more entries to complete for those that had no content. From there, I will do a quick copyedit of the remaining files, then wait to do a final proof until I can see them live on the site.
This week saw me complete the following tasks:
Looking ahead to past the long weekend, I will continue with the vessels file. I will focus on adding content to the entries absent of the same, even if it is drawn exclusively from the despatches. Each vessel write-up has the potential to become a project unto itself! So, I won't spend too much time on the file afterwards. Give it a proof and, hopefully, turn it over to a research student in the fall for tweaking.
I have completed content for all the placenames to date. There are some loose strings to snip, but the bulk of the labour is done, which puts us at 180 entries. As the placenames have taken on a life of their own, in terms of scale, I am concerned about errors.
So, I have created a checklist to confirm that each entry has (1) content, (2) sources, formatted correctly, (3) correct geo-coordinates, and (4) a final proofing. I figure that it will pay to do this work now, rather than have our dear readers point out our errors later! It shouldn't take too long, as much of it is done already.
Greg has been a fine chap and set me up on a new machine. I have spent some time, as needed, getting it set up. I will be using GRsync to do nightly backups to my server space on Home1t, rather than always dragging and dropping. Good times!
I have whittled the remaining placenames down to a dozen to go! I hope to complete these by next Tuesday at the latest. I was distracted, positively, by the task of popping out to pick up UVic Ceremonies' pictures from the Colonial Despatches launch at Government House. I have posted the pictures to the coldesp server in a folder called "coldesp_launch_pics." I presume that we are to review the files and purchase any of the ones that appeal.
I continued to work the placenames file this week, and have whittled the list down to 33 to go: a daunting number, but I should be through them by the end of next week. From there, I will move on to tune up the vessels file, as it needs a little attention.
Just to have a place to have this recorded...
In the places.xml file, the entry for Duwamish River calls for special charactes, based on the Indigenous spelling for Duwamish, seen here: http://www.duwamishtribe.org/culture.html
I used the # unicode characters as follows: DḵẖʷʼDuwʼAbsh
Spent the day working on the place-name entries. Some of them are turning out to be fairly tricky to pin down! So far, so good. I have tracked down some great new sources that I will use in the future.
I'm now saving the stats for Coldesp once a month, from UVic's Urchin; we need to do this to comply with the grant requirements. Also wrote to JS to set up monitoring for govlet.ca.
I have edited the instances of "fn" tags for 1846 - 1857 XML files. I caught a few more groups in the process, and created a list of spelling variants that I can use for 1858.
KSW reported a failure to show some dates, and issues with bolding of names on the bios page. The date problem was caused by uncertain dates without text content -- I had been assuming that where dates were uncertain, a useful piece of text would be supplied in the tag content, but now where that's not the case, I'm using the @when attribute instead. Uncertain dates are also now being supplied with a following question-mark through CSS.
Bolding problems, along with some ordering issues I noticed myself, were caused by my failure to realize that there would be some people having no <surname> or <forename>, but merely a <roleName>. That's now fixed, and names are all bolded appropriately and sequenced correctly.
As hoped, I managed to complete the people, places, and date tags for the remaining years, up to and including 1857.
Along the way, I cleaned up a number of biography entries.
Spent some time updating the website a bit, and writing some content for the final report. The content has gone to CP, who is compiling it.
These are the stats I'm submitting for the final report, along with the method I used to get the info, so we can easily repeat the process at any time.
Use the website search with no parameters except "To: 1857".
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
count(collection('/db/coldesp/bios/')//person)
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
count(collection('/db/coldesp/bios/')//person[not(contains(./note/p, 'not yet available'))])
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
count(collection('/db/coldesp/places/')//place)
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
count(collection('/db/coldesp/places/')//place[not(contains(./desc, 'not yet available'))])
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
count(collection('/db/coldesp/vessels/')//list[@type='vessels']/item)
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
count(collection('/db/coldesp/vessels/')//list[@type='vessels']/item[not(contains(./p, 'not yet available'))])
SSH into nfs.tapor.uvic.ca, then: cd /home1t/coldesp/www/jpg_scans ls -R jpg_800 | grep '.jpg' | wc -l
In the eXist admin client:
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $startPages := count(collection('/db/coldesp/correspondence/')//biblScope[@type='startPageImage'][contains(@facs, '.jpg')]),
$pageBreaks := count(collection('/db/coldesp/correspondence/')//pb[contains(@n, '.jpg')])
return $startPages + $pageBreaks
Go to the map gallery on the site; the total is shown at the top of the page.
1247 despatch docs incl. 1858 1 browse by date page 5 index pages (Index + sub-pages) 219 map pages (gallery + 218 maps) ____ 1472
In addition to all the transcription and annotation documents, there are lots of XML feeds also used by the site and available, some linked, some not, many created as dynamic views of the data.
Team meeting to discuss the launch and the way forward.Main takeaway for me is the requirement to complete bits of the final report that are my bailiwick asap.
Also added a page to the site, listing all the First Nations names and referring strings that have been tagged so far. This is not linked from anywhere yet; it's more so that we can get an idea of what's been tagged, and how easy it might be to impose an ontology on the names mentioned. We're still discussing whether to tag all referring strings, no matter how vague, or only names and name-like strings.
Lots of final tweaking, and a rebuild of the portable ColDesp, with the latest revisions to documents included. We've installed and tested it on GN's laptop as well.
Lots of final tweaking, and a rebuild of the portable ColDesp, with the latest revisions to documents included. We've installed and tested it on GN's laptop as well.
I have cleaned things up as much as possible in the time allowed. I will prepare a brief report of where things stand, and discuss this at the wrap meeting on the 23rd.
KSW reported that vessel names are not italicized inside vessel bios, which turned out to be caused by the fact that we don't process vessel name tags unless they have @keys; @keys will be added. I also tweaked the CSS so that such names are italicized automatically when they're in the vessel bio context, but not in the context of the main documents, where we shouldn't alter the text style.
There's one oddity in the vessel bio page, which is that when you click on the name of a vessel from inside another vessel's bio, the clicked-on vessel's bio appears as a popup; but in that process, it's extracted from the main list, and is never replaced in it when you close the popup, causing it to disappear from the list. That's obviously a bad idea. I'll be working on it.
After watching the presentation through on the TV at GH, I figured the final bit (snippets) needed longer display times for each of the items, so I've doubled those. I think we're ready now.
Place names and people names, as well as dates, have been completed for 1854.
Went down to GH with GN and tested our presentation played from the laptop through their large TV. Looks pretty good, although it can't do widescreen and it's CRT TV res. Might not do for showing the website, but that's not a problem really. We can find out about that on the day.
Spent most of the afternoon building a portable ColDesp, and learned that:
site/jpg_scans, and the maps in /site/maps.
<!--Start hacks for portable version. -->
<map:match pattern="jpg_scans/**.jpg">
<map:read mime-type="image/jpeg" src="jpg_scans/{1}.jpg"/>
</map:match>
<map:match pattern="maps/**.jpg">
<map:read mime-type="image/jpeg" src="maps/{1}.jpg"/>
</map:match>
<!-- End hacks for portable version. -->
export JAVA_HOME=/usr/lib/jvm/java-6-sun export PATH=$PATH:/usr/lib/jvm/java-6-sunI've also used java-6-openjdk on my main machine, and both seem to work fine for Tomcat and Cocoon.
Otherwise, everything is working fine. I also made a small change to the schedules page at JL's request.
Note to self: having paths distributed across three files makes no sense. Could they be centralized? The problem really is the XQuery file, which can't get info from an XSLT file.
... and checked that they would fit in the car on the way back. We still need to find out if they need us to take the easels down or not.
Did some rough calculations of the scale of the remaining work and possible costs, based on work accomplished so far, for JL, who needs this for a meeting tomorrow.
I continue to work through the years of 1854 - 1857. I estimate that I should complete the tagging of places, people, and dates in these files by the 18th of June.
Lots of fixes required. I think the text got mangled or re-typed badly at some point.
Both 1852 and 1853 have had people and places tagged. It took roughly 3 days to complete 1853. Should this pace be possible to maintain, we should get through the remaining years by June 22, bu this is just a guess at this point.
Carmen Koning, External Marketing Officer, has all the materials required to begin the launch booklet. I asked that she liaise with me as needed on anything technical. I passed on the the email that Kerra StJames sent me regarding payment details.
All seven posters have arrived, and they look great. Although, for some reason, the "anatomy of a despatch" poster had the URL at the top left of the page cut in half. I will print off a sticker to cover up the mistake.
All account and payment details will be handled through Kerra StJames of the Ceremonies and Events office. I provided Island Blue with the FAST code Kerra had relayed to me by phone.
KSW and I went downtown to Island Blueprint to check the proofs of the foam boards. They look excellent. Should be finished by the end of the day.
I managed to get the last host of changes in the booklet incorporated into the latest draft, which I have sent as a PDF to Kerra and John for review. Kerra mentioned that she wanted to add the names of the recently hired actors.
Added two switches that can be set from the URL search component that can control the speed of the presentation, and the starting point, so it's no longer necessary to comment out the earlier parts while you're working on the later ones.
Reworked a lot of the existing presentation, and wrote the two final sections, Statistics and Snippets. I think the whole runs to about 5 minutes now, but I haven't timed it to see exactly. All tested and working in Firefox, Opera, Chrome and Epiphany.
I have finished tagging the CO 410 files with "biblscope" and "pb" tags up to and including 1858. Since 1858 is off the radar for this launch, I will add the "pb" tags only, but they still require proofing and tagging of places, and people, in some cases. I have made a note of this.
I now have five different maps in the map section, showing a fair cross-section of the different types we have. That's probably enough for that section. Next I'm going to see if I can find any snippets that are short and snappy enough to work. I have many of KSW's, and a few of my own harvested today, but most are too long to work in this kind of presentation. 19th-century bureaucratic prose rarely contains sounds bites.
It appears to be the case that if you call JQuery's animate() passing any set of options, in Webkit browsers, where the starting value and ending value for e.g. top or left are the same, the browser will elect to animate them anyway, setting them to zero before starting the animation. This was throwing out all my custom animations in Chrome and Epiphany. So I've rewritten the code so that where a value is not going to be altered, it's never actually passed in to the animate() call. The presentation is now functioning correctly on FF, Webkit and Opera engines.
Greg raised the possibility of using the kiosk presentation code I've been working on as the basis for a manually-controlled presentation, as a sort of replacement for S5, and I've been thinking hard about how that could be done. These are my ideas so far:
display=kiosk, display=manual).HcmcSlideList object as currently, but then call a new method on it: setupManual(). Don't call next().setupManual works through the complete set of slides, and figures out the sequence of hide and show events. It adds them all to an array, in a form in which they can be called using eval().HcmcSlideList::manualNext() passes the next item in that array to the eval() command, causing the next slide to be shown or hidden.HcmcSlide object keeps a reference to its original parentNode as one of its properties, stored when it's created.HcmcSlideList object maintains two other arrays: hiddenSlides and shownSlides. Whenever an item is shown, its index in the HcmcSlideList.list array is pushed to shownSlides. hide action, this is what happens (pseudo-code):
slides.list[i].hide(); slides.hiddenSlides.push(slides.list[i].parentNode.removeChild(slides.list[i]));This removes the hidden slide from the page, but keeps in in an array so it can be restored if necessary.
shownSlides array.HcmcSlideList::back() is called:
HcmcSlideList.counter.hiddenSlides array, and append it again to its parentNode.shownSlides array, and set it to "display: none".This should enable backtracking through all the slides, without any transitions -- which is probably what you want when going back through the slides. Because we're decrementing the counter, moving forward again will run the hide and show transitions as expected, so you can resume moving through the array. There are only two issues:
HcmcSlideList::manualNext() several times in rapid succession, although each individual transition will take as long as it takes, they'll all run simultaneously, so that might not be a big problem.jumpTo" parameter in the URI, which would cause the setup code to run all transitions prior to the jumpTo point simultaneously.I met with John, who gave me some instructions for the booklet. He then sent on the first draft of the booklet copy. I used both to build a working draft, which will be used as a guide by the UVic communications team, on Kerra's side.
I sent Kerra a copy, in PDF form, for her feedback. I will hear back from her early next week, and incorporate her changes. Still to be finalized from our team: some image placement, copytext, and final proofing of the same.
In consultation with the team I have completed 7 poster boards for display at the June 22nd launch. I have shipped JPG versions to Kerra, who has responded she will review them early next week.
As for proofs, once Kerra gets back to me with her changes, if any, I will drop off PDFs at Island Blueprint, hopefully, by next Wednesday.
Finished off the section on the digital despatch, with lots of screenshots, and began the section on maps.
This required me to tackle the remaining features I hadn't added yet, for a variety of different transitions used to show content, because I need to have content arriving from different directions in this section. I've now refactored all the showing and hiding code, and I have a total of six different transitions for showing and six for hiding a slide. I've cleaned up a lot of that code, so it's now less than 200 lines, and I've solved one issue with object encapsulation that was puzzling me. When you make a setTimeout call, you need to pass a string containing the code to be executed when the timeout is up. My problem was that I wanted to set the timeout from inside an object method, calling another method of the same object; however, the "this" keyword won't work, because the timeout is executed outside the object scope. The solution was to add a parameter to the object constructor in which the name of the global pointer that refers to the object is passed in; this way, the object "knows its own name", and can set a timeout calling its own method using that variable name.
There's one remaining issue with webkit browsers (Epiphany and Chrome) whereby moving show transitions always seem to start from top left, even when they're asked not to. I'll work on that next week. I may have to explicitly set the slide's left and top to what they already are (left = offsetLeft + 'px') so that they can retrieve a working starting value for the motion calculations.
I realized that the presentation code really needed to be able to handle variable triggers for the following item, so that items can appear in quick succession where necessary, so I've revised the back-end code. It's now quite flexible and getting rather sophisticated; it'll definitely be usable for future kiosk-style presentations, and GN also suggests we make it switchable into a manual presentation. I think that could be done by setting a flag in the JavaScript whereby, instead of triggering the presentation on startup, the code instead parses through the list of slides and figures out what order each of the appearance and disappearance actions would occur in, and then places them in an array, where each can be triggered manually through an eval() call. The only difficulty there is that it's going to be hard to make it possible to go back. But I'll keep thinking about that.
I also tweaked the existing content, and added some more bits. I think the content is about 50% complete now.
JL has provided a lengthy description of the project, from which I'm trying to harvest the key points and turn them into the meat of the automatic presentation, illustrating it with images wherever possible. I'm probably about a third into it by this point, still working on the best ways to handle the requirement to display on a variety of screen and font sizes. I've settled on positioning and sizing in percents, on the basis that I can then adjust font sizes through the browser to get the best use of screen space (= largest display text that will fit).
It's still pretty spartan, but once the content is done I can start tarting it up a bit.
The presentation seems to be coming together. JQuery animation has some problems, so as happened before when I tried to use JQuery, I've ended up using less and less of it, and writing more of my own code. I have a slide object and a slide list object, and the list manages slide timing and transitions, and I've set it up so that the original slide <div>s are placed on the page where you want them to end up, using @style attributes. The display time for a particular slide is placed in its title attribute (a hack, but not actually breaking any rules), and the slides are shown in the order of the <div>s in the page. That means you can create the slide show by editing HTML, and not worry much about the script.
Working on basic code for an animated presentation using JQuery. Found two bugs in JQuery already. Grrr.
The three sizes of images for the co410/1 group has been uploaded to the coldesp server.
Discussed the foam boards and the program.
Dug out my original launch poster, and added a faded-out map in the background, then reworked it for the new launch at 36 x 48. Seems to work OK...
Met with JL and KSW to talk about graphics for the launch. Foam boards are more or less organized; we've selected a map (co_700-bc_2_van_isl_1854), and we'll work on the "splash" poster tomorrow. Also made a tweak to logos on CP's instructions.
Time-consuming markup of one of the Spanish maps of Juan de Fuca Strait. I'm slowly getting familiar with the placenames, though.
Marked up a 1789 map of Ahousat. It doesn't look much like modern-day Clayoquot Sound, though, so I've found it almost impossible to set the Google Map coordinates with any precision.
I have produced 3 poster boards this week, with a fourth on the way. Here is a quick rundown on each.
1. Splash Poster: this is, more or less, a logo-only poster, that showcases the project name, intention, and years of coverage [incomplete].
2. Despatches by Numbers: lists some key numbers association with the project thus far, such as how many images produced, instances of various things, years covered in the collection, and so on [draft complete].
3. What's in a Despatch: anatomy of a typical letter, which points out--with pictures and connector-lines--details such as the despatch number, salute, marginalia, and more. The intention here is for the viewer to get a quick sense of the intricacy and minutia of the handwritten letters [draft complete].
4. What's in a Digital Despatch: as above, but the digital version. This will give viewers a sense of how we "transform" the handwritten letter into something you can "interact" with on the web. It shows some of the basic features of the site, such as the navigation bar [draft complete].
5. The Despatch Maps: this page is yet to be designed, but it will function as above (see 3 and 4) to show viewers the things they can expect to do and find with respect to our online maps [incomplete].
6. The Douglas Collage: this is a bit of fun, and a way to show the viewer the sheer volume of the letters available. It will feature a collage of images built from the images in the collections [in process].
Working on the second of the Spanish maps, from 1779, I had only a few items to mark up, but a huge amount of quite interesting research to do. I eventually discovered, from some web research and parsing through the lengthy Spanish caption, that the longitude readings on the map are relative not to Greenwich but to San Blas (see this ref, which helped to figure out the range of the map. I also identified a few places that were previously mysterious, including Punta de los Mártires (Point Grenville, Washington State), Entrada de Hezeta (apparently the Columbia River delta), and Las tres Marias (now Islas Marías, Nayarit, Mexico). RS from Hispanital has helped me a lot with the transcriptions.
Did a bit of research for the next map, which I haven't started yet: it seems that Friendly Bay is now Yuquot, and Scott's Bay may be Eagle Bay (or may not), and is probably in Barkley Sound. More work to do there.
Added several more logos to the credits page, some of which had to be pieced together from component parts.
Work completed today:
<ref type="map" cRef="[mapId]#[placeId]"> handling so that you can link directly to a spot on the map from any TEI document.These features are now complete:
I think this is basically ready to go now. I'll push it up to the site tomorrow morning.
The maps will eventually mostly have geo coordinates relating them to the area they cover, and I've now modified the code which creates KML files from place information in the places collection so that it can also create KML based on a map document id. This enables us to create a link to Google Maps, passing the KML URI to Google, to see the area laid out over a modern-day Google map.
Can't test this till we have it up on the main site, because it works from a hard-coded site URI which is supplied to Google.
Working most of the day on the map display, and getting close to completion. I've now got the menus functioning as required -- looking pretty much like the main menu on the rest of the pages, but right-aligned; and I've got the access keys working for all the menus except for the items in the drop-downs. The site banner graphic is now included, as is the metadata (both displayed for the reader and included in Dublin Core in the header). There's a lot more tidying up to do, and I need to test on IE (it works on the other browsers), but we're pretty close now.
Nine files in 1858 require proof against their respective images, which are in CO 6/26 [on the coldesp server already in LAC scans folder: B-3009 and B-3010]:
B585HB06.scx, B585HB07.scx, B585HB09.scx, B585HB10.scx, B585HB11.scx, B585HB14.scx, B585HB35.scx, B585MI01_A.scx, B585MI02.scx.
For now, since we are scheduled to deliver up to 1857 (for our June 22nd, 2010 launch), we will put the processing of the images required for these files on hold.
A nagging hangover from Before the Great Image Processing was several placenames without write-ups, largely from 1852. I have finished these off, and added a few others. We are now at a total of 133 placename write-ups!
I've managed to make the map sizing code relative -- MJ's page layout was done in pixels originally, but the underlying map handling is within a box that can be sized easily, so that wasn't too hard. I've also started integrating the site style a little bit, starting with the menu, which now looks a bit like the main site menu. I have to decide what components of the main menu I want to include on this page -- there isn't room for them all. Probably just Home and Map Gallery. I'll need to enable keystroke navigation somehow, to comply with our accessibility policy, and that will be quite hard. Then I have to find a good place to put the copyright/disclaimer info on the page, and find space for the metadata about the map. It's going to get a bit crowded. No room for the header graphic, unfortunately, although I'll have to try it out just to be sure.
Working all day on integrating MJ's map display code into the site. Much of it was positioned and sized in pixels, which is natural because the main component is an image sized in pixels, but it is possible if you're careful to size all the text components in ems and percentages to make for flexibility. I laid out the menus using a different approach (inline-display list items, with the right menu floated, rather than floating list-items). Then I started work on the display of the annotations. I've arrived at a solution that I think works pretty well, but there are still some oddities with regard to positioning that I need to work on, because I'm placing the annotations in a static location. I'm half-way there.
This all precedes somehow adding the main site styling to the page, which is going to be a challenge because of space constraints. I also have an issue with some zones not showing up; these appear to be the ones which have customized id attributes because they're going to be pointing at places in the places database.
While working on this, I also re-organized the map gallery display, to make it less confusing. The mouseover popup now appears in the right margin rather than over the top of the moused-over image, which means that the original is not obscured, so you can more easily click on it.
I have processed the images for CO410, Volume 1. But, I am concerned that there is a page missing, so I have asked Chris to re-order the reel. Better safe than sorry. Once the reel arrives, it should take only minutes to find and incorporate the file, if it is indeed our mistake.
Meeting with KSJ and KSW at Govt House to discuss arrangements for the launch in June. Took notes and video of the location, and agreed on some basic plans for AV, foam board visuals etc.
CP was doing a report and needed some stats, so I generated them. There are currently:
There are:As yet, we have no processed page-images from CO 6, but we have 10,652 images waiting to be processed from CO6/18 through CO6/36.
As we added two new images to the RG7, volume 1, collection, I had to rename the 60px, 800px, and full size images, respectively. I completed this and uploaded the renamed images.
Brought down all the latest updates from KSW's changes, and deployed them to the db. Rewrote the backup-to-local scripts to find stuff in the places it's now located on the server.
It turned out that we missed only two images in RG7-G8C, volumes 1, 2, and 3. I have processed them and added them to the server in the appropriate folders. I then returned the film reel to Chris' desk, in his office.
I have roughly 200 xml files left in 1858 for PB-tagging. So, over 400 completed already! I forecast roughly 4 days to finish the rest.
Caitlin will use up the rest of her hours on time, and this should leave us with a much-improved vessels database. I will work with her next Friday to tune it up for publication.
Meeting with UVic Ceremonies to plan the launch. Planning for AV will have to wait until May, when we can get down to Gov't House with KS to find out the lay of the land. In the meantime, following the meeting, KSW and I did some research on poster and foam board printing, since it's likely that some maps and pictures on foam boards might be a good alternative to projectors and screens.
Thanks to Leanna's relentless capacity for toil, bless her, we are in better shape. I should be able to complete the PB tags for the remaining 1858 XML files by, roughly, April 20th, possibly sooner.
From there, I will process the images for 410/1, which should take a few days. Then, I will move on to tagging people and places in years 1852-1857 inclusive--as 1858 is, presumably, done already.
I've enhanced the map gallery so that something slightly more attractive than a tooltip shows the details of the maps when you mouse over them. I've also added some drop-shadows, so the maps look more like other documents on the site. I think there may be some more cosmetic changes in future, but I think this will do for now.
Images for 1855 (CO 305 06) are now on the site, indexed and working, and I'm running the rsync operation from yesterday to back them up to Rutabaga.
Finally did something I've been meaning to do for ages: created a full backup of the processed page-images from Lettuce to Rutabaga. This took literally all day (I throttled rsync a bit to make sure Lettuce didn't struggle). I realized I could also have logged on directly to nfs.tapor.uvic.ca, taking Lettuce out of the equation; might do that in future. This is the process: ssh into lettuce, navigate to the coldesp www folder, and run rsync --verbose --progress --stats --compress --recursive --times --bwlimit=10000 jpg_scans/ -e ssh mholmes@rutabaga.hcmc.uvic.ca:/"home/mholmes/backups/Martin/Colonial\ Despatches/www/jpg_scans/"
. I did the equivalent for the maps directory as well.
After more hassle than I expected getting forms to submit with correct values, I now have a flexible and responsive map gallery, with paging features, and the ability to change the sort sequence and the number of items displayed on a page. Next is getting the individual map display code working, which means figuring out how MJ's code works, and trying to fit it into the context of the site.
Note to self: remember, in future, that rather than trying to get a form to submit itself in the old-fashioned manner, it's much simpler to write a bit of script that grabs all the values you want and constructs a GET URL, then sets the location to that. Even simpler might be AJAX, but in this case it seemed overkill so I went the route of a traditional form page. And gave myself loads of trouble as a result (e.g.: if you try to trigger submission from the onchange event of a <select> element, you'll have trouble getting the value of the selected option, and the onsubmit event of the form will never fire).
Tested and working in FF, WebKit (Epiphany), Opera, and IE8.
Spent most of the day writing the map browser for our collection of 219 maps. I have it basically working, in a way that's similar to the Mariage site (but much simpler from a CSS point of view). I haven't yet got the Previous and Next buttons working -- in fact, it'll probably be a bit more sophisticated than that, allowing the user to decide how many maps to view on one page, etc. But the basics are all done, including some complicated bits to lay out the variously-sized thumbnails in an even manner on the page.
The identifiers for the RG7 images differ slightly from the format of the CO ones, in that the central component (G8C in this case) cannot be reduced to an integer, so when I uploaded these images I had to tweak a bit of XQuery to make the image browser display them properly.
NAC, or RG7, G8, Volume 1 covers the years 1849-58, inclusive. So, even though we have RG7 volumes 2 and 3, I will ignore those for now, until we need them.
Also, Chris has reordered the RG7 reel, as we think we might have missed an image. If so, it will take only minutes to find it.
Small changes to the 1856 image file names required a complete re-upload, which is slightly complicated by the need to delete some files whose names are no longer in use. Lots of cautious rsync operations did the job; then I had to update the XML file which lists all the scans. That file's getting a bit big, so I'm wondering whether we might start breaking it down for ease of maintenance at some point.
The four trial markup files had to be integrated back into the map setup, because the format of the files and the metadata content is now more detailed, and the "large" size maps are now slightly differently sized from the ones I was working with when I did the markup. This involved some juggling and moving all the markup zones in some of the files. That's now done.
Completed the creation of XML files for the maps I've identified as relevant (now down to 218). The images are on the server, and I can now start looking at the creation of a gallery and integrating the rendering code MJ has written.
Today I finished my little QT app to handle generating the IMT TEI files for the maps, and began using it to churn through the maps. So far, I've produced 160 XML files and associated map resizings. Found a couple of oddities in the process, and there are many maps where the metadata from the Excel spreadsheet is generic to a group, rather than specific, and these will need to be enhanced. But basically the process is working. I'm done up to 1861; I'll start from 1862 tomorrow.
I've uploaded the images from CO 305 05 and 07, and added entries to the db for all of them. This leaves CO 305 06 (1855) to be done, along with the CO 410 and RG images.
Leanna finished 1854 and I finished 1856. We will now move on to adding the page-break tags for these years. Firstly, however, I will add the two or three new place-name entries gathered from 1852 (completed earlier in the week).
This is what the app now does:
I need to auto-generate IMT files for all my maps, and populate them with as much metadata as possible. To that end -- and to continue my learning process with QT -- I'm writing a little QT app to do it. So far, it can:
Much of the rest of the metadata will be simple, but some will not -- I'll have to parse the Excel spreadsheet to get descriptive information, for instance, which will take some work.
Copied all the JPGs into a thumbs directory and created all the thumbnails thus:
mogrify -resize 100 *.jpg
Ready to start looking at how to generate all the XML files. That's going to be a bit of a hard one, but I think it might be possible to do it with a little QT app, and it might save an awful lot of time.
After a process of careful renaming, as well as checking against the spreadsheet to correct original naming errors and replacing some more unstitched components with complete versions, I now have 220 maps with relatively useful filenames which are designed for the web. Many of the filenames do not yet reflect the contents or include the year, but that can be done later. We have enough to be working with for the moment.
I now have a collection of usable maps (226 in all), sorted by year. Next steps:
Worked through the 30 or so new maps from DVDs (and created a local copy of the DVDs). These are now sorted by year, and their filenames have some useful info in them. Next I have to find all the stitched items that JF did, and replace my unstitched fragments with them.
Some good news. Leanna has moved on to parsing the 1854 images, and is well under way. I have completed all but two (curses!) 1852 files (places, people, vessels, and page breaks). I should be ready to post 1852 on Monday afternoon. Lastly, Frank has submitted 13 biographies. I will proof and post these once 1852 is complete, just for a brief change of pace.
Looking ahead, I will move on to prep/parse the 1855 images, then see where Leanna is at. Image-prep is a priority now, as we can't do much without the images!
LSPW has finished processing and linking the 1853 images, so all those files have been added into the system (CO 305: 4). Also trimmed out one file which was a dupe (V525HB00, duplicates 01).
Fixed the bug which was causing abstracts do disappear on a page accessed through a search. It was a trivial bug, but took a while to find because I'd confused myself by leaving and old version of an XQuery file in the tree, alongside a new one with a slightly different name. Doh.
In the process, though, I finally set up a local copy of the site running in my local Tomcat, which makes for easier debugging and dev work.
KSW discovered a very odd piece of behaviour on the site. When you go directly to a document with an abstract, the abstract appears as normal. If you go to the same document with a search string in the URL (as you would if it were the result of a search), then the abstract does not appear. I've been working on this for an hour, and I can't figure it out. So far, I know:
<notesStmt> component of the <teiHeader>, but it's failing to do so. It's successfully inserting other components of <fileDesc>, which is very odd indeed.The placenames were given a "final" edit. Then, I moved on to the various tagging elements in 1852. Mostly through the more common people and places. I expect that it should take roughly 5 days to complete all tagging and page-break tags. Meanwhile, Leanna is roughly 1/3 into the page-break tags for 1853!
Well, some good news: Leanna has finished the image processing for all of 1853. Next up, she will add the page-break tags for the same year. And, I have completed all the placename entries to date. There will, of course, be many more to come, but the backlog is over. So, I will now be better able to the placename database up to date. Finally, Caitlin has finished tagging the 1578 and 1858 vessels. The extra help is really starting to pay off!
Leanna and I finished scanning CO410 (V1), and CO410 (V2). We will tackle the RG series next week!
On Dandelion, which runs Karmic, Google Earth was a cludge-monkey. Then I tried installing version 5.0 from here <http://earth.google.com/intl/en/download-earth-advanced.html> -- be sure to select the radio button for 5.0! Now G Earth is running like hot butter.
Took a detailed look at newly-edited files from 1851 and 1858, validated them all, merged them into the regular tree, and updated the site.
Took a final pass at 1851 an it looks ready to ship, barring any niggling errors we catch in the coming weeks.
I will now catch up on some placename write-ups, and then dig into 1852 on Thursday. From here on out, we are in "just-the-basics" mode. This means that we will tag people, places, vessels, First Nations groups (when we catch them), and add page-break tags. All matters of fine-tuning of format, or transcription, and the addition of abstracts, will be put on hold until time allows.
Meanwhile, Caitlin will continue with tagging vessels, and Leanna will continue with her work on the images.
As of Friday, Feb 12th, there are 6 abstracts to complete. This puts me on a predicted end for 1851. It might be good to have a quick check of the 1851 files, once we "complete" the year. This should wrap-up '51 by Monday afternoon!
Leanna is moving efficiently through the '53 images, and Caitlin has added at several new vessels to the vessels file.
Caitlin is moving ahead nicely on the vessel tagging and placeholder entries. And, Leanna is underway with the image processing for 1853. Meanwhile, I am completing the abstracts for the remainder of 1851, a process that, barring interruption, I should complete by the end of Friday.
I walked Leanna through the processes associated with image processing, this afternoon. She will use Photoshop to start, but I showed her Picasa as well, in order to let her decide which looked easier to use. I suggested she work in Photoshop for half the day, and Picasa thereafter, before she decides.
I will ask Greg if it's alright to install Picasa on her machine.
Went over to CP's office to plan the microfilm ordering and digitization, and introduce LSPW. Some films should arrive within a week or so.
Managed to get through all of 1851 last week, save the abstracts, which I will likely complete by mid-to-late Tuesday. I also moved ahead in the time remaining last Friday, to tag all instances of Vancouver Island in the 1852 files.
Looking ahead, we have Leanna on board for 17 hours per week. For the first little while, she will focus on processing and optimizing the images for the remaining years, and she will add some extant images wherever possible. Like Caitlin, as she becomes more familiar with things, we will sort out what she wants to do most.
Caitlin will push ahead with the vessels database, and I will ask that she tackle a few vessel write-ups soon.
MJ and I have spent some time working out an approach to the map rendering which will keep the page fairly clean and free of clutter, and he's devised a CSS-based drop-down menu system for it. Today I've sent him an outline of the transformation that needs to be written to create the basic output; he'll work on that, and when it's ready, I'll integrate it into the main site (adding calls to headers etc.), and get familiar with the code in the process.
This was the map from despatch 9099 (1852). Uploaded my places file and checked my coordinate outlines on Google Maps. I'm now ready to start writing the code for map rendering.
Nothing in particular to report from this.
The second map contained a detail which was an inverted plan of the fort, so I've extracted that into a separate image and marked it up separately. I've now completed markup of two three images (=two maps), and I'm onto the fourth. Once that's done, I'll stop and focus on the backend code for handling the maps.
Our problems with group settings for files and directories have continued to plague us, and we discovered today that some of it was my fault; my permissions script was setting permissions to 2755 instead of 2775, due to a typo. Having fixed that, and had all three of us run the script, the only problems that remain are files and directories created by LR, on which she never ran the script. Those can't be deleted by us, unfortunately, since she's now left, so eventually we'll have to get sysadmin to do it.
Went through the handful of places I had created for my first map, and used the new Google Earth/XSLT process to add more detailed location information. The first trial failed in Google Maps, and I discovered it was because the Google Earth data comes out with 13 decimal places, which won't work; I tweaked the XSLT to produce 6 places only, and that works fine. So we now have some nicer outlines of places. The match between outlines in Google Earth and Google Maps is not precise, but it's close enough to live with.
Selected a second map to work with, and began marking it up. I've chosen a simple sketchmap by Vavasour of Victoria Harbour and the Gorge. That shouldn't take long, and will perhaps generate one or two more places. After that, I'll have to pick a more substantial map for my third pilot document.
Google Earth gives us the option to designate a polygon on the map and export it as KML. I'm already producing KML from TEI <location>/<geo> tags to map our places onto Google Maps, but I've now written the reverse process so we can easily create our place markup in TEI using Google Earth. This will improve accuracy and save time over what we were doing before. KSW has a copy and has a transformation scenario set up in his oXygen environment.
This is a set of basic instructions for marking up maps for the Despatches project, using the Image Markup Tool. These instructions will be extended and refined as the process shakes down.
<teiHeader> editing area (TEI / Edit teiHeader, and change all the metadata you see to match the image you're working on. Note: For the date of the map, you want to provide (a) a date range using @notBefore and @notAfter, covering all the dates mentioned on the map (from surveys, revisions etc.); and (b) a single date as the text content of the date tag, which should be the date most prominently shown in the map title area. Where there is no useful date at all, you can use "n.d." in the date tag, and make an educated guess for the date range.<geo> tags in the <sourceDesc>/<bibl> element. If that's not possible (because for instance the map is hopelessly inaccurate), then delete the <geo> tags.You'll see that there are three categories for annotations: transcription (areas in green), places (areas in blue), and notes (areas in red)..
The places category is used to define places which are included in our database of places (in the XML files included in xml/places. Of course, you may decide as you're marking up the map that a place on the map deserves to be in our places database, and add it to one of the XML files to make this happen. Encoding place information is documented in our main guidelines.
To mark up one of these places:
places category.@xml:id attribute which is associated with the place in the places XML file. (You can easily discover this by going to the Places index and hovering your mouse over the name of the place).<p> tag, transcribe any text label which identifies the place on the map, and mark it up based on its appearance on the map.Here's an example. Imagine that you're marking up Parry Bay on a map.
Parry Bay in the Annotation Title box.parry_bay.PARRY BAY in the Annotation Text box (because on the map, it's labelled in capitals).All other text on the map, including placenames which are not in our database, is transcribed using the transcription category. This is how to transcribe placenames appearing on the map:
transcription category.<p> tag, transcribe and mark up the text as it actually appears on the map.For example:
<p>Ned P<hi rend="text-decoration: underline; vertical-align: super; font-size: 80%;">t</hi></p>All other text on the map -- titles, publication info, explanatory text etc. -- is also marked up using the transcription category:
transcription category.Title, Publication information, etc.).Sometimes it's necessary to add an editorial note or explanation to something on the map which wouldn't otherwise be marked up. Use the notes category to do this.
A link to a map document on the site looks like this:
<ref type="map" cRef="MPK1-59_10_vancouver_island_1846_detail">[Linked text]</ref>
The @cRef attribute contains the @xml:id (filename without .xml) of the map annotation document.
I've now done a trial markup of a map (an 1846-48 map of the Vancouver Island coast east of Sooke). This has thrown up the following points for consideration:
Other than that, the process of marking up with Google Earth to find coordinates works pretty well, and IMT is working well under WINE now I've fixed a couple of issues with it.
The use of the IMT requires the facsimile element, which wasn't available at all when the original ColDesp schema was generated, so I had to generate another one. Working from the original ODD file, Roma produced a broken schema for some reason, so I started from scratch and added all the modules from the original ODD, and ended up with a working schema. It's a bit smaller than the original -- although it could be substantially reduced by running oddbyexample.xsl. I'll do that at some stage, but not now because the map markup will introduce lots of requirements we haven't had before.
It looks as though the IMT has some small issues on Linux that I didn't know about (its own ident information is not available to it, so attributes remain unfilled in the saved file). But I have the first file started, and some transcription done; I still need to figure out exactly what kinds of copyright info and disclaimers need to be in the document. That should be addressed at our meeting.
The main repository for metadata about the maps will be the library, which is the formal custodian of the files and is best set up for storing and serving good metadata. Nevertheless, we'll need to store a significant amount of information in our <teiHeader>s, and before I figure out how best to store it, I want to enumerate it, along with some notes on potential issues:
FO925-1650_pt1_23_becher_pedder_bays_1846-48.jpg. @xml:id should be filename without extension.<titleStmt>.<titleStmt>, but both titles should also be included in the <sourceDesc>/<bibl>.<titleStmt>/<respStmt>).@notBefore and @notAfter. This allows us to sort them by date, but still show the full range of associated dates.<publisher> and <pubPlace> tags in the <sourceDesc>/<bibl>.<idno> tag.That's probably all that we need for the moment. Other text on the map which does not identify locations should probably be transcribed, so we would have two categories of annotation: places and transcriptions.
Frank and I covered a few things related to the bios.
I asked that he produce roughly a dozen before shipping them to me for copyediting. Also, I asked that he consult Chicago Style for issues related to citation, which we discussed. This is an ongoing process, and, as we reach a point of consistency with all of the above, I will add our "rules" to the guidelines document.
Finished tagging most common placenames and people names in all 1851 files, as well as all dates. Today was spent on the rest. Caitlin continues to work ahead on the vessels, and all is going smoothly.
Went through the library's map set and substituted their stitched versions for all the fragments in my set. I now have a complete set of the maps I want to include, organized (roughly) by year.
Caitlin is on board now, and working away on the vessel names. It is our hop that she will have time to tag all vessels with a unique ID for all years up to and including 1858; she will write placeholder information-entries for each new vessel (a quick process). Once this first round is complete, and should time allow, Caitlin will begin to draft vessel entries, or write the odd one for a bit of a brain-break from coding. We are happy to have her aboard the Good Ship Despatches!
I have completed place-name write-ups for 50 places, and have added placeholder entries for several since. Which puts us at 71 place-name records, and growing. As I turn attention back to working through the years ahead, I will diverge on Fridays, when time allows, to catch up on lagging place-names.Incidentally, I would have begun the work sooner, but I wanted to see how best to work with Caitlin's skills and leanings. Now that we see she is keen on vessels, I can dig into tagging names, places, and so on, in 1851, and beyond.
Incorporated the latest images (1851 and 1852) into the db and uploaded the three sets of jpegs to the server.
Next I have to replace the fragmented ones with the stitched copies created by the library.
Dating is going to be a problem. Some maps have no dates at all, or can only be dated by inference, but others have multiple dates -- dates of the original surveys, dates of adjustments added later, and date of publication. I think the only solution is to list each <date> with a @type attribute specifying what they mean, and also include one <date> element with no @when, but with @notBefore and @notAfter, to specify the complete range.
I've got as far as CD #15, item FO925-1650 pt1 (10). Two and a half CDs to go. Many are mislabelled or misdated.
I noticed that the "Mentions of this x in the documents" link was not showing up in the popup version of the person, place or vessel info, although it was still there in the index pages. This proved to be a by-product of my move to make all those lookups into AJAX requests; in the process, I had taken the individual items out of the context of the <list> or <listPerson> element they are normally contained by, and this had the extra effect of failing to trigger some of the required rendering code. When I put them back into a list, the "Mentions" link came back, along with some good display features (colour-coded titles depending on the type of item). This, though meant that they ended up displayed as list items, so they were offset to the left and had a bullet point. I've now added some CSS to eliminate that problem in the context of the popup.
A productive week. I managed to optimize the remaining images in the 1851-1852 folder, which are bundled together in the microfilm images. So, that makes roughly 1,200 separate images for the years stated. As for Friday, I have been catching up on editing and writing placename entries with speedy results. The aim is to keep them short, but as well-researched as possible, and cited fully along the way. Thanks the gods for Andrew Scott's recent book, The Encyclopedia of Raincoast Placenames.
I am told that we should have a new RA's labour to add to the fray in the next week or so. This will help speed of the mechanics of the machine to great effect!
The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at http://bcgenesis.uvic.ca, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at http://revision.tapor.uvic.ca/svn/coldesp/.
| << | Current | >> | |
| Jan | Feb | Mar | Apr |
| May | Jun | Jul | Aug |
| Sep | Oct | Nov | Dec |