Archives for: March 2012


Permalink 02:06:00 pm, by mholmes, 163 words, 112 views   English (CA)
Categories: Activity log; Mins. worked: 30

map_lookup.xml done

Simple XQuery to pull out the data:

xquery version "1.0";

declare default element namespace "";
declare namespace tei = "";

<maps xmlns="">
for $t in //tei:TEI
<map xml:id="{$t/@xml:id}">
if ($t//tei:title) then
if ($t//tei:idno[@type="penfoldNum"]) then


I might have to add more data points to the output; in fact it might be worth just pulling out the whole of the sourceDesc. I'm currently looking at the possibility of enhancing my UniSymMetric Java class so it could be called as an extension function from XSLT in Saxon; that would give me a fallback when there's no Penfold number, and it might be handy in all sorts of other ways too.

Permalink 10:56:36 am, by mholmes, 306 words, 106 views   English (CA)
Categories: Activity log; Mins. worked: 60

Importing metadata from ContentDM

JD pointed me at an OAI feed from ContentDM, which is exactly what I need for my metadata harvesting. This is my plan:

I've started work on an XSLT stylesheet to do the job. The purpose of the stylesheet is to process detailed OAI metadata records which use Dublin Core identifiers into teiHeader elements suitable for adding to TEI documents Despatches project.

The OAI metadata is in the file oai_from_contentdm.xml, and originates in the UVic Library's ContentDM system. It contains 261 records relating to Early BC Maps, and most of these are maps also in the Colonial Despatches project collection. The ContentDM metadata is well-organized and has been considerably enhanced, so we're going to take that data and generate new teiHeader elements for our TEI files from it.

The first stage is to create a mapping between each of the fields in the OAI data and the location in the teiHeader where we propose to store it.

Input documents:

  • oai_from_contentdm.xml (OAI record set).
  • ../xml/maps/*.xml (TEI documents for each of the maps)
  • map_lookup.xml (simple XML document which hopefully provides enough data to allow this transformation process to retrieve the correct TEI document for each record in the OAI data. This lookup will be based on a number of factors, including Penfold number, title, and descriptive information. Creating this file is the next stage in the process.

Output documents:

  • ../xml/maps_enhanced/*.xml (from each TEI document we have, create an enhanced version which incorporates the original @xml:id and metadata, as well as the facsimile element with data about the image file, but also builds in the metadata gleaned from the OAI file. These files will eventually replace the original TEI files in the Despatches site, once the Map Gallery code has been rewritten to work with them.
Permalink 08:25:26 am, by mholmes, 178 words, 86 views   English (CA)
Categories: Activity log; Mins. worked: 30

Map confusion and metadata

Adding this as a task for me, long-term, because it needs to be part of the plan for the next phase of the project.

I had pointed JT at fo_925-1650_pt_1_24_vic_harbour_1847, which is Penfold 576, for the Kellett map of Victoria Harbour, but it turns out he wanted Penfold 577, which is fo_925-1807_vic_1848. I've slightly enriched the metadata for 577 using data from ContentDM, manually, but there should be a way to do this mechanically because the ContentDM metadata is organized into clear fields. Ultimately, it would be a good idea to find some way to get at this metadata and pull it into our headers, so we'll have to write a mapping between the two. Here's an example of the ContentDM data in HTML:

It claims to be XHTML, but it's not even well-formed, never mind valid, so it couldn't be parsed with e.g. XSLT unless it was tidied first. Hopefully there's a more helpful feed from it. I'm contacting JD about that.


Permalink 05:10:57 pm, by mholmes, 27 words, 85 views   English (CA)
Categories: Activity log; Mins. worked: 30

Map dates need tweaking

Dating of maps is inconsistent for maps which have a notBefore and/or notAfter. Check them in the sorted gallery, find oddities, and normalize. Did some today.

Permalink 04:46:35 pm, by mholmes, 48 words, 88 views   English (CA)
Categories: Activity log; Mins. worked: 60

Housekeeping and bugfixing

Did some auditing of the "Marion's transcriptions" spreadsheet that we're using to keep track of the transcriptions awaiting markup, since PCA has been working on these; checked filenames and made updates and notes where appropriate. Also fixed file naming issue reported by PCA, and did some other housekeeping.

Permalink 11:05:05 am, by mholmes, 194 words, 169 views   English (CA)
Categories: Activity log, Documentation; Mins. worked: 60

Adding maps to the site

JT provided two new maps for the gallery, so I've added those. I had to refresh myself on the procedure for doing this, so I'll detail it here:

  • Extract the bitmaps from the PDFs (if that's the format they come in) using pdfimages -j [pdffile] [outputprefix].
  • Create meaningful filenames based on repo, id numbers, and year.
  • Copy the full-sized originals into the correct year in [coldesp]/maps] on local drive. These will just be backed up locally.
  • Create a quarter-sized "large" image (max width 5000) in maps_lg.
  • Create a 1000px-wide version in maps_1000.
  • Create a 200px-wide version in maps_200.
  • Create a 100px-wide version in maps_thumb.
  • Create an XML file with the same name as the image file, and a matching @xml:id. It's simplest to model this on an existing file. Save it in xml/maps.
  • Fill out the metadata, and point the facsimile graphic at the right file name, with the right dimensions.
  • Add the XML file to SVN and commit it.
  • Upload the images to home1t, and the XML file into the db.
  • Test to make sure the map shows up in the gallery, and works properly on the site.
Permalink 08:57:36 am, by mholmes, 43 words, 53 views   English (CA)
Categories: Activity log; Mins. worked: 15

Five more documents assigned to PCA

I've assigned the first five 1859 documents transcribed by MM to PCA; the 1858 documents are rather complicated, and the existing 1858 documents need some editing, so it's simpler to work on the 1859 documents for the moment. The Google spreadsheet records the status of each document.


Permalink 02:05:11 pm, by mholmes, 32 words, 175 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 10

Task: renaming of file in SVN and in db

DONE: The transcription of the document 58-01-21_HBC748.rtf is marked up as the file V585MI30, when it should be V585MI02_A. It is already up on the site.


Permalink 04:24:32 pm, by mholmes, 87 words, 83 views   English (CA)
Categories: Activity log; Mins. worked: 60

Linked 26 vessels from Schedules

All vessels referred to in the Schedules which have obvious existing vessel bios have now been linked (including one correction to a typo, "Fartar" instead of "Tartar"). The remaining vessels, for which new vessel bios will be required, are:

East Lotherian
John Bright
John Stephenson
John Stevenson
Nanaimo Packet
Prince of the Seas
Royal Charlie

It's likely that the John Stephenson and John Stevenson are the same vessel, and possible that they're actually the John Stevens.

Permalink 03:54:23 pm, by mholmes, 44 words, 84 views   English (CA)
Categories: Activity log; Mins. worked: 30

Changed William Allen xml:id

The William Allen was tagged as "william", which made it confusable with the Brig William ("william_brig"). I've now changed the vessel bio and all references to it to show "william_allen". Also fixed an encoding issue in an 1854 document that I stumbled across.

Permalink 03:28:03 pm, by mholmes, 25 words, 363 views   English (CA)
Categories: Announcements; Mins. worked: 0

Abstracts now added for 1854

Thanks to some excellent work from Petria Arienzale, abstracts have now been added for all 1854 documents. We now have abstracts for all years between 1846 and 1854.


Permalink 10:44:20 am, by mholmes, 20 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 60

Latest review for PCA

Reviewed PCA's latest work (excellent) and sent comments. Also noticed a couple of issues in other documents and fixed them.


Permalink 01:43:51 pm, by mholmes, 35 words, 253 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 5

Change William Allen id to "william_allen"

DONE 2012-03-26: The xml:id for the William Allen is currently "william", which is very confusing; change it to "william_allen", and change refs to it, so it's not confused with the Brig William.


Permalink 09:46:45 am, by mholmes, 99 words, 172 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 175

Need to check linking of vessels

NOTE: Completed 2012-04-23. Many new vessel entries have resulted from this work, and they will need to be completed when time permits.

Try this, first in /db/coldesp/correspondence, and then in /db/coldesp/:

xquery version "1.0";

declare default element namespace "";

for $r in //name[@type='vessel'][not(@key)]
return $r

The vessel tags inside the correspondence seem mainly to be for vessels which HAVE write-ups; these should simply be correctly linked with @key. The broader set include vessels which may not have bios yet; bios need to be created, and those vessels linked.

Permalink 09:09:52 am, by mholmes, 191 words, 87 views   English (CA)
Categories: Activity log; Mins. worked: 30

TNB's report at end of workstudy

This is the state of play on TNB's work as of today:

  • Peripheral bios will all be finished except for one:
    • gordon_t, Captain George T Gordon is the entry.
    • He was captain of the Cormorant, on station in Nisqually in 1846.
    • Gordon Lake was named after him.
    • More research is required to complete his bio.
  • B58 bios: references all switched to Chicago style, and minor edits done up to storks_hk. Old references have just been commented out. Sometimes better references have been added, from a more recent source.
  • A lot of citations for the revised bios still need to be checked in hard copies in the library; sometimes the library will have a different edition, and page numbers may have to be changed.
  • Many, many bios remain to be completed (more than two thirds).
  • Many bios refer to BCDES and could be linked to page-images we have (e.g. the bio for shepherd_j), but we currently lack a system to link from editorial text to a page-image. This needs to be implemented, and BCDES references linked and clarified.
  • Vessels and placenames are up to date to the end of 1861.


Permalink 02:06:30 pm, by mholmes, 409 words, 143 views   English (CA)
Categories: Activity log; Mins. worked: 30

Addressing addressees

There are issues with the search engine relating to both authors and addressees of correspondence. The drop-down lists are generated from distinct values of tags in the header. These tags, inherited from the Waterloo Script, contain plain text, and so the same individual is identified in a variety of different ways. It would be helpful if we could tag these names with ids from the personography, and then build our search engine drop-downs in a more intuitive fashion.

It seems best to start with the addressees, since they constitute a much smaller number (only 89 distinct values, listed below). The simplest approach would be this:

  • Create an XML file listing the referents (or just use the search_lists.xml file).
  • Identify each referent and tag it with the appropriate id from the personography.
  • Create a default personography entry for completely unknown people, uncertain people and missing people.
  • Fix any known oddities (like the square brackets around Carnarvon in one document).
  • Write an identity transform that adds the appropriate id to all files.
  • Update the search form generator so that it pulls appropriate info from the personography based on the distinct values of the name/@key attributes.
  • Update the search form and the search to use the new feature.


  • [Carnarvon]
  • [None]
  • [Unknown; Eliot?]
  • [Unknown]
  • [Various]
  • Adderley (Parliamentary Under-Secretary)
  • Assistant Secretary of State
  • Assistant Under-Secretary
  • Ball (Parliamentary Under-Secretary)
  • Banister
  • Barclay
  • Begbie, Thomas
  • Birch
  • Birch (Assistant Clerk)
  • Blackwood
  • Blackwood (Chief Clerk)
  • Blackwood (Senior Clerk)
  • Blanshard
  • Buckingham
  • Cardwell
  • Carnarvon
  • Carnarvon (Parliamentary Under-Secretary)
  • Chief Clerk
  • Clerk
  • Colonial Office
  • Colonial Secretary
  • Desart (Parliamentary Under-Secretary)
  • Douglas
  • Duke of Argyle
  • Earl Grey
  • Elliot (Assistant Under-Secretary)
  • Elliot (Permanent Under-Secretary)
  • Fortescue (Parliamentary Under-Secretary)
  • Gairdner (Chief Clerk)
  • General Public
  • Gladstone, R.
  • Granville
  • Graves, S.R.
  • Grey
  • Grey, Sir George
  • Hankin
  • Hawes
  • Hawes (Parliamentary Under-Secretary)
  • Head Clerk
  • Herbert (Assistant Under-Secretary)
  • Herbert (Permanent Under-Secretary)
  • Herman Merivale
  • Herman Merivale, Esq.
  • Herman Merivale, Esq. Under Secretary of State for Colonial Affairs
  • Higgins (Private Secretary)
  • Holland (Assistant Under-Secretary)
  • House of Commons
  • Irving (Junior Clerk)
  • Kennedy
  • Kimberley
  • Labouchere
  • Lytton
  • Merivale
  • Merivale (Permanent Under-Secretary)
  • Molesworth
  • Monsell (Parliamentary Under-Secretary)
  • Musgrave
  • Newcastle
  • Officer Administering
  • Pakington
  • Palliser
  • Palmerston, Viscount (Secretary of State, Treasury)
  • Palmerston (Treasury)
  • Parker (Private Secretary)
  • Peel (Parliamentary Under-Secretary)
  • Peel (Under-Secretary)
  • Pelly
  • Prince of Wales
  • Queen Victoria
  • Robinson (Senior Assistant Clerk)
  • Rogers
  • Rogers (Permanent Under-Secretary)
  • Russell
  • Sandford (Assistant Under-Secretary)
  • Secretary of State
  • Secretary of State for Foreign Affairs
  • Seymour
  • Smith
  • Stanley
  • Stanley (Foreign Office)
  • Under-Secretary for the Colonies
  • Under-Secretary of State
  • Under-Secretary of State Foreign Office [sic]
  • Young
Permalink 01:51:28 pm, by mholmes, 55 words, 236 views   English (CA)
Categories: Activity log; Mins. worked: 20

Consistency edits to XML files

Following one of KSW's notes in this post, removed date tags from specific location in 17 files. This is presumably for consistency -- only 17 files had them -- and because I suspect some useful parsing can be done/is being done based on the first date in the text being the date the document was penned.

Permalink 01:19:12 pm, by mholmes, 74 words, 82 views   English (CA)
Categories: Activity log; Mins. worked: 30

Added helpful message for when mentions not found

Items in the indexes have a link under their info popup which enables you to retrieve references to them in the correspondence, but sometimes there are no references (as in the case of peripheral bios, which are referred to in other bios but not in the actual correspondence). Previously, clicking on the "Mentions..." link simply did nothing in these cases, but I've now added a trap for this condition and an appropriate error message.


Permalink 02:48:49 pm, by mholmes, 108 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 120

Work on documentation and crediting MM

Added appropriate credit to MM for her transcription work, and began the process of pulling documents from Google Docs into the actual repo, which is a bit easier to keep track of. Found one suitable document to get PCA started with full-doc transcription, and created a simple guide to the file/id/naming convention for our collection. Wrote a detailed assignment for PCA and sent it. This process will include a check that our Guidelines document in fact provides enough guidance for a encoding a complete new document. Most likely we will be expanding it in the next week or two as PCA starts to add new transcriptions.


Permalink 10:46:52 am, by mholmes, 47 words, 95 views   English (CA)
Categories: Activity log; Mins. worked: 90

PCA's directed reading report

Reviewed the extensive (and excellent) work completed by PCA, who is now nearly at the end of the 1854 abstracts. Wrote a number of notes for tweaks and fixes, as well as a couple of requests for further research and the transcription of a mysteriously-untranscribed despatch (V547102A).


Permalink 05:40:53 pm, by mholmes, 51 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 45

Error with vessel info

PCA reported that mentions of the Brig William, wrecked in 1854, are linked to the vessel info for the William Allen, which is not the same ship at all. We dug around to find some references from which to construct a new vessel entry, and she's now going ahead with writing it.


Permalink 01:51:20 pm, by mholmes, 11 words, 80 views   English (CA)
Categories: Activity log; Mins. worked: 20

Retrieved stats

Note: Francotoile and the Mysteries projects are not showing any stats.

Colonial Despatches

The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at



March 2012
Sun Mon Tue Wed Thu Fri Sat
 << < Current> >>
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30

XML Feeds