Archives for: 2012

28/11/12

Permalink 04:28:12 pm, by mholmes, 12 words, 113 views   English (CA)
Categories: Activity log; Mins. worked: 30

Fix to stats page calculation

Updated the way the stats page counts non-complete place and vessel entries.

21/11/12

Permalink 10:05:02 am, by mholmes, 40 words, 669 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

CO 398 Vol 2 page images added to the Colonial Despatches collection

539 page images for CO 398 Vol 2 (in three different sizes) have been added to the collection. These

cover the BC 1861-1867 Entry Books of Correspondence: Letters from Secretary of State and Despatches. These will now be linked into the transcription documents.

19/11/12

Permalink 11:34:09 am, by mholmes, 205 words, 103 views   English (CA)
Categories: Activity log; Mins. worked: 60

Updated OAI metadata records

Refreshed all the OAI metadata records. I'm going to document this process since I don't seem to have documented it in the past.

  • Start up a local Tomcat with the "coldesp2" webapp in its webapps directory.
  • Connect with the JNLP client.
  • Clean out and refresh all the XML files in the db (watch out for deleted despatch records, where for instance we've found dupes). For large collections such as correspondence and oai/records, it's quickest to delete the collection and re-create it. Leave the records collection empty.
  • Go to the query interface, and paste a copy of oai_update.xq into it. Run it.
  • When it's complete, download the records collection and replace the local copy with its contents.
  • Commit changes to SVN. Look out for the need to delete an old oai record where a despatch may have disappeared.
  • Change the oai/meta/identify.xml file to update the earliestDatestamp element appropriately.
  • Upload that file and all the records files to the live db. Again, it's quickest to delete the records collection and re-create it. This also ensures that obsolete records are removed.

The generation process used to take about four hours on my previous workstation; this time it took only half an hour.

01/11/12

Permalink 09:46:40 am, by mholmes, 35 words, 83 views   English (CA)
Categories: Activity log; Mins. worked: 30

Various typo fixes in encoding

Doing stats (see previous post) I found some encoding oddities in name encoding. These should be constrained by the schema, so I'm going to have a look at the possibility of rebuilding the schema accordingly.

Permalink 09:10:19 am, by mholmes, 69 words, 106 views   English (CA)
Categories: Activity log, Documentation; Mins. worked: 60

Stats for this round of funding

Generated these stats for CP's report on this round's grant funding:

Images processed so far this round for 1861:

CO 60:10, 60:11, 10:12
CO 305:17, 305:18
RG7 G8C:21

for a total of 4369 images, at 3 sizes = 13107.

1317 links to page-images have been added to the 404 documents for 1861.

According to my calculations, so far in 1861, 7150 names of people, places, and vessels have been linked:

5252 people
  65 vessels
1833 places 

KSW will do some calculations for the next application, for 1862.

31/10/12

Permalink 02:21:36 pm, by kim, 119 words, 80 views   English (CA)
Categories: Documentation; Mins. worked: 0

"Secret" document type removed

We discovered that one of the old scripts we used to convert the documents ran amok a little and added a false "documentType" value of "Secret." Liekly becasue the script assumed that "Secretary" counted as "Secret"!

We removed <idno type="documentType">Secret</idno> from 1,862 files. Revision number prior to this mass-fix: 990. First revision number with ONLY this fix: 991.

Important: there are actually 6 "Secret" files. These documents have <head> elements containing "Secret" but not containing "Secretary":

  • B61023SP.scx
  • B67067SC.scx
  • B67128AS.scx
  • B68058SC.scx
  • B697061A.scx
  • V61025SC.scx

We have added the <idno type="documentType">Secret</idno> to these files, and this revision number 992.

Permalink 08:45:47 am, by mholmes, 46 words, 160 views   English (CA)
Categories: Activity log, Announcements, Documentation; Mins. worked: 15

Colonial Despatches: Encoding guidelines document available

After some discussion and a request from a user, we've decided to make our encoding guidelines document available on the site. It is, of course, in a state of continuous evolution, so we'll refresh the PDF periodically. A link has been added to the Development page.

02/10/12

Permalink 09:15:37 am, by mholmes, 7 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 15

Retrieved website stats

Gathered stats up to end of September.

13/09/12

Permalink 03:41:35 pm, by mholmes, 63 words, 365 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 120

Welcome to our new Directed Reading students

The Colonial Despatches project would like to welcome our two new team members, Alison Malis (doing a Directed Reading in the History Department) and Brigitte Dreger-Smylie (Directed Reading/Professional Writing Program). We also welcome back Theo Biggs, previously doing Directed Reading but now as a workstudy student, entering his third year with us. There should be lots of activity over the coming semester!

11/09/12

Permalink 01:22:40 pm, by mholmes, 27 words, 97 views   English (CA)
Categories: Activity log; Mins. worked: 30

Fix to broken stats page rows

Stats for complete, incomplete and unavailable bios were being incorrectly calculated following our change to the use of persName/@type recently. Reported by KSW, and now fixed.

04/09/12

Permalink 10:27:18 am, by mholmes, 66 words, 79 views   English (CA)
Categories: Activity log; Mins. worked: 60

Fix for broken schedule links and missing document scenario

The links from schedules not reliably connected with a document id were failing with an inscrutable error, as was any URL which didn't actually point at a document (where a sort of 404 would be expected). I've now fixed that, so that a cleaner "not found" document appears, and schedules with plausible target documents (based on despatch numbers and dates) actually jump to the first plausible document.

15/08/12

Permalink 12:35:19 pm, by mholmes, 61 words, 85 views   English (CA)
Categories: Activity log; Mins. worked: 30

Status 'incomplete' and 'unavailable' for bios etc.

KS-W and I clarified and extended the existing system for classifying bios, and place and vessel definitions, and I updated the XSLT accordingly:

  • unavailable: No content has been written yet.
  • incomplete: Some content exists, but it is unfinished or inadequate, and will be rewritten in future.

A number of existing bio entries will be reclassified from incomplete to unavailable by KS-W.

14/08/12

Permalink 11:11:33 am, by mholmes, 53 words, 56 views   English (CA)
Categories: Activity log; Mins. worked: 60

Meeting on future for maps

Met with JL, CP, IO'C, and DB-M re the georeferencing of maps from the Coldesp collection and the Library's collection. There will be another meeting in September to thrash out more details, and in February to look at some results from student work in a GIS class on some of the existing maps.

26/07/12

Permalink 03:19:01 pm, by kim, 45 words, 126 views   English (CA)
Categories: Documentation; Mins. worked: 0

Duplicate files in 1861

We will need to produce duplicates for some of the files in the 1861 collection, specifically, for documents that appear as letter-book copies in 398/1 and as originals in the RG7 G8C 9 collection.

We will handle this process as we have done before in previous collections.

04/07/12

Permalink 09:10:50 am, by mholmes, 44 words, 123 views   English (CA)
Categories: Activity log; Mins. worked: 30

Fix to title of LAC

A reader pointed out that we have two competing abbreviations for what is now Libraries and Archives Canada, LAC and the older NAC. We have now replaced all instances of NAC with LAC, and updated the search engine to take account of the change.

13/06/12

Permalink 10:36:16 am, by mholmes, 39 words, 128 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 20

RG7 G8C vol 21 page images added to the Colonial Despatches collection

282 page images for RG7 G8C vol 21 (in three different sizes) have been added to the collection. These cover the Despatches to London July 1859 to April 1861 (letterbook copies). These will now be linked into the transcription documents where appropriate.

12/06/12

Permalink 03:26:40 pm, by mholmes, 31 words, 404 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

CO 60 Vol 11 page images added to the Colonial Despatches collection

466 page images for CO 60 Vol 11 (in three different sizes) have been added to the collection. These cover the 1861 Despatches to London Sept-Dec. These will now be linked into the transcription documents.

11/06/12

Permalink 08:46:02 am, by mholmes, 32 words, 346 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

CO 60 Vol 12 page images added to the Colonial Despatches collection

588 page images for CO 60 Vol 12 (in three different sizes) have been added to the collection. These cover the 1861 Public Offices and Miscellaneous correspondence. These will now be linked into the transcription documents.

30/05/12

Permalink 08:55:55 am, by mholmes, 33 words, 320 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

CO 60 Vol 10 page images added to the Colonial Despatches collection

767 page images for CO 60 Vol 10 (in three different sizes) have been added to the collection. These cover the 1861 Despatched from London, January to August. These will now be linked into the transcription documents.

29/05/12

Permalink 05:37:25 pm, by mholmes, 83 words, 84 views   English (CA)
Categories: Activity log; Mins. worked: 240

ContentDM metadata now imported

Today's progress:

  • Finished the XSLT, and ran it against the whole collection.
  • Went through those files with no matches in the ContentDM repo, and manually ported over info from similar files, and elaborated what was there based on map legends etc.
  • Tweaked the XSLT to add a link to the ContentDM repo.

Still to do: rework the processMapBibl template so that it really uses all of the info that's now there (author, publisher, etc. etc.). This should probably be done with regular templates.

28/05/12

Permalink 10:00:43 am, by mholmes, 189 words, 158 views   English (CA)
Categories: Activity log; Mins. worked: 210

Mapping between ContentDM metadata and TEI

This is the complete mapping for copying metadata over from the ContentDM records to our TEI files:

  • dc:title (multiple): titleStmt/title, bibl/title.
  • dc:description[1]: notesStmt/note (replace the first one).
  • dc:description[preceding-sibling::dc:description][string-length(.) gt 50]: notesStmt/note (add new ones). These are the textual descriptions; the shorter ones are various scale and coordinate details.
  • dc:description[matches(., "^[0-9]+[ 0-9'NW\-\./]+$") and string-length(.) gt 3]: bibl/geo. These one-line expressions of geo locations will have to be further processed into something we can use to map to Google. They're not really in consistent format.
  • dc:subject (multiple) = notesStmt/note type="subject".
  • dc:creator = bibl/author.
  • dc:contributor[not(preceding-sibling::dc:creator)][not(starts-with(., 'Fund')] = bibl/author.
  • dc:language == 'eng' : bibl/@xml:lang = 'en'
  • dc:language == 'spa' : bibl/@xml:lang = 'es'
  • dc:contributor[starts-with(., 'Fund')] = funder.
  • dc:publisher = bibl/publisher
  • dc:relation = bibl/publisher (really should be repository, but we don't want to be get into having a full msIdentifier).
  • dc:identifier[starts-with(., 'http://contentdm')] = idno type="contentdm".

I'm now halfway through the XSLT which will integrate the metadata into the TEI files. Should be done tomorrow.

25/05/12

Permalink 09:47:00 am, by mholmes, 54 words, 115 views   English (CA)
Categories: Activity log; Mins. worked: 60

Mapping between ContentDM metadata and TEI

This is my preliminary mapping:

  • dc:title (multiple): titleStmt/title, bibl/title.
  • dc:description[1]: notesStmt/note (replace the first one).
  • dc:subject (multiple) = notesStmt/note type="subject".
  • dc:creator = bibl/author.
  • dc:language == 'eng' : bibl/@xml:lang = 'en'
  • dc:language == 'spa' : bibl/@xml:lang = 'es'
  • dc:contributor[starts-with(., 'Fund')] = funder.
  • [ more to come later... ]

24/05/12

Permalink 03:00:26 pm, by mholmes, 113 words, 109 views   English (CA)
Categories: Activity log; Mins. worked: 240

Matching part of the process finished

Spent most of the day manually aligning records between ContentDM and ColDesp, so this is where we're at:

  • DONE: Manually edit the XHTML file to fix bad matches among the candidates.
  • DONE: Search for matches for the unmatched items manually.
  • DONE: Add matches found back into the XHTML.
  • Generate from the XHTML a list of pairings from which metadata can be brought over.
  • Map desired metadata fields in ContentDM OAI file to TEI.
  • Write XSLT to port the metadata into the TEI files.
  • Update the map gallery rendering code to include the new metadata.

Also wrote to CP with a list of 7 maps that we have, but which are apparently missing from ContentDM.

23/05/12

Permalink 03:14:27 pm, by mholmes, 150 words, 102 views   English (CA)
Categories: Activity log; Mins. worked: 240

Matching with ContentDM records

More progress on matching with ContentDM. I've now generated an XHTML file with two tables, one of candidate matches (186 maps) with links to both ColDesp and ContentDM, for human checking, and one of failed matches (33 maps from ColDesp), with ColDesp links and enough metadata for a manual search. I've manually verified the 186 candidate matches and found that most match; I reported one map apparently missing from ContentDM to CP, and found a dupe in ColDesp.

Next steps:

  • Manually edit the XHTML file to fix bad matches among the candidates.
  • Search for matches for the unmatched items manually.
  • Add matches found back into the XHTML.
  • Generate from the XHTML a list of pairings from which metadata can be brought over.
  • Map desired metadata fields in ContentDM OAI file to TEI.
  • Write XSLT to port the metadata into the TEI files.
  • Update the map gallery rendering code to include the new metadata.

22/05/12

Permalink 11:33:43 am, by mholmes, 34 words, 226 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

CO 305 Vol 18 page images added to the Colonial Despatches collection

910 page images for CO 305 Vol 18 (in three different sizes) have been added to the collection. These cover the 1861 Vancouver Island Public Offices and Miscellaneous Correspondence. These will now be linked into the transcription documents.

10/05/12

Permalink 02:55:36 pm, by kim, 42 words, 137 views   English (CA)
Categories: Tasks; Mins. worked: 15

Apostophe rending glitch?

EDIT: Fixed 2012-05-23. In this file, hover over the word "Majesties," which has sic/corr tags around it, the intention being to correct it to "Majesty's." In the hover-over pop-up, the apostrophe renders as the hex-code for an apostrophe. Very strange!

09/05/12

Permalink 01:23:28 pm, by mholmes, 36 words, 277 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 60

Complete set of CO 305 Vol 17 page images added to the Colonial Despatches collection

The complete collection of 1208 page images for CO 305 Vol 17 (in three different sizes) have been added to the collection. These cover the 1861 Vancouver Island Despatches to London. These will now be linked into the transcription documents.

23/04/12

Permalink 04:24:59 pm, by mholmes, 4 words, 76 views   English (CA)
Categories: Activity log; Mins. worked: 120

Completed vessel linking from Schedules...

...described in this post.

Permalink 11:12:49 am, by mholmes, 37 words, 70 views   English (CA)
Categories: Activity log; Mins. worked: 60

Draft project proposal for 2-year window

Based on our meeting last week, I've drafted a proposal for the HCMC committee for the port of the project to a pure eXist implementation with enhanced searching, NLP topic discovery, etc. Sent to JL for comments.

12/04/12

Permalink 03:53:59 pm, by mholmes, 97 words, 101 views   English (CA)
Categories: Activity log; Mins. worked: 90

Working on integrating similarity metric with Saxon

For many projects it will be useful to have a way of calling a java lib which can make a universal similarity metric measurement of two strings. I've started working from this documentation to create a class and the necessary wrappers to make this work. I'm still trying to resolve some dependencies, but I think this will be practical, and we'll be able to use the USM module in the context of oXygen (where we're allowed to use Saxon EE). The testbed for this will be the matching of ContentDM records with our TEI metadata for maps.

11/04/12

Permalink 03:58:52 pm, by mholmes, 20 words, 92 views   English (CA)
Categories: Activity log; Mins. worked: 60

Meeting with CP and JL: future plans

Put together an immediate and a longer term plan for the project; I'll detail these when I have a chance.

05/04/12

Permalink 02:46:34 pm, by mholmes, 230 words, 75 views   English (CA)
Categories: Activity log; Mins. worked: 90

More work on matching ContentDM data with our maps

I've done some preliminary alignment with XSLT to find out which maps we have which can be matched with entries from ContentDM:

  • 176 items have matching Penfold numbers. These would be reliable matches.
  • I've matched a further 9 items based on catalogue ids.
  • One item where we have a Penfold number appears not to have a match in ContentDM. This is #549, mpg_1-557_3_queen_charlotte_sound_1792, which seems to be missing from ContentDM.
  • 76 items in ContentDM have no match (via Penfold) in our collection.
  • In addition to #549, 33 items in our collection have no match in ContentDM.

It seems likely that many of these items actually do match, but because they have no Penfold numbers or matching ids, I'll have to match them with some sort of fuzzy matching approach.

I regenerated my map_lookup.xml file with a bit of added data:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace tei = "http://www.tei-c.org/ns/1.0";

<maps xmlns="http://hcmc.uvic.ca">
{
for $t in //tei:TEI
return 
<map xml:id="{$t/@xml:id}">
{
if ($t//tei:title) then
<title>{$t//tei:title[1]/text()}</title>
else
()
}
{
if ($t//tei:idno[@type="penfoldNum"]) then
(
<penfold>{$t//tei:idno[@type="penfoldNum"]/text()}</penfold>,
<docId>{$t//tei:idno[@type="doc_id"]/text()}</docId>
)
else
()
}
Permalink 09:06:30 am, by mholmes, 17 words, 70 views   English (CA)
Categories: Activity log; Mins. worked: 40

Final report for PCA

Completed the report for PCA, who signed off yesterday, and sent it on to SD and EG-W.

04/04/12

Permalink 03:11:16 pm, by mholmes, 27 words, 333 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 60

Four new 1859 documents added to the collection

Four new correspondence documents from 1859 have been added to the correspondence, transcribed by Marion Massey and marked up by Petria Arienzale. The total document count is now 7151.

02/04/12

Permalink 03:27:10 pm, by mholmes, 44 words, 74 views   English (CA)
Categories: Activity log; Mins. worked: 120

Work on missing vessels

Added the first few new vessels to the vessels file, fixing some typos in the original transcription, confirming the existence and naming of the vessels, and finding some sources to get the researcher started. Lots more to do. I'm up to the John Stephenson.

30/03/12

Permalink 02:06:00 pm, by mholmes, 163 words, 101 views   English (CA)
Categories: Activity log; Mins. worked: 30

map_lookup.xml done

Simple XQuery to pull out the data:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace tei = "http://www.tei-c.org/ns/1.0";

<maps xmlns="http://hcmc.uvic.ca">
{
for $t in //tei:TEI
return 
<map xml:id="{$t/@xml:id}">
{
if ($t//tei:title) then
<title>{$t//tei:title[1]/text()}</title>
else
()
}
{
if ($t//tei:idno[@type="penfoldNum"]) then
<penfold>{$t//tei:idno[@type="penfoldNum"]/text()}</penfold>
else
()
}
</map>

}
</maps>

I might have to add more data points to the output; in fact it might be worth just pulling out the whole of the sourceDesc. I'm currently looking at the possibility of enhancing my UniSymMetric Java class so it could be called as an extension function from XSLT in Saxon; that would give me a fallback when there's no Penfold number, and it might be handy in all sorts of other ways too.

Permalink 10:56:36 am, by mholmes, 306 words, 76 views   English (CA)
Categories: Activity log; Mins. worked: 60

Importing metadata from ContentDM

JD pointed me at an OAI feed from ContentDM, which is exactly what I need for my metadata harvesting. This is my plan:

I've started work on an XSLT stylesheet to do the job. The purpose of the stylesheet is to process detailed OAI metadata records which use Dublin Core identifiers into teiHeader elements suitable for adding to TEI documents Despatches project.

The OAI metadata is in the file oai_from_contentdm.xml, and originates in the UVic Library's ContentDM system. It contains 261 records relating to Early BC Maps, and most of these are maps also in the Colonial Despatches project collection. The ContentDM metadata is well-organized and has been considerably enhanced, so we're going to take that data and generate new teiHeader elements for our TEI files from it.

The first stage is to create a mapping between each of the fields in the OAI data and the location in the teiHeader where we propose to store it.

Input documents:

  • oai_from_contentdm.xml (OAI record set).
  • ../xml/maps/*.xml (TEI documents for each of the maps)
  • map_lookup.xml (simple XML document which hopefully provides enough data to allow this transformation process to retrieve the correct TEI document for each record in the OAI data. This lookup will be based on a number of factors, including Penfold number, title, and descriptive information. Creating this file is the next stage in the process.

Output documents:

  • ../xml/maps_enhanced/*.xml (from each TEI document we have, create an enhanced version which incorporates the original @xml:id and metadata, as well as the facsimile element with data about the image file, but also builds in the metadata gleaned from the OAI file. These files will eventually replace the original TEI files in the Despatches site, once the Map Gallery code has been rewritten to work with them.
Permalink 08:25:26 am, by mholmes, 178 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 30

Map confusion and metadata

Adding this as a task for me, long-term, because it needs to be part of the plan for the next phase of the project.

I had pointed JT at fo_925-1650_pt_1_24_vic_harbour_1847, which is Penfold 576, for the Kellett map of Victoria Harbour, but it turns out he wanted Penfold 577, which is fo_925-1807_vic_1848. I've slightly enriched the metadata for 577 using data from ContentDM, manually, but there should be a way to do this mechanically because the ContentDM metadata is organized into clear fields. Ultimately, it would be a good idea to find some way to get at this metadata and pull it into our headers, so we'll have to write a mapping between the two. Here's an example of the ContentDM data in HTML:

http://contentdm.library.uvic.ca/cdm/singleitem/collection/collection5/id/130/rec/2

It claims to be XHTML, but it's not even well-formed, never mind valid, so it couldn't be parsed with e.g. XSLT unless it was tidied first. Hopefully there's a more helpful feed from it. I'm contacting JD about that.

29/03/12

Permalink 05:10:57 pm, by mholmes, 27 words, 70 views   English (CA)
Categories: Activity log; Mins. worked: 30

Map dates need tweaking

Dating of maps is inconsistent for maps which have a notBefore and/or notAfter. Check them in the sorted gallery, find oddities, and normalize. Did some today.

Permalink 04:46:35 pm, by mholmes, 48 words, 74 views   English (CA)
Categories: Activity log; Mins. worked: 60

Housekeeping and bugfixing

Did some auditing of the "Marion's transcriptions" spreadsheet that we're using to keep track of the transcriptions awaiting markup, since PCA has been working on these; checked filenames and made updates and notes where appropriate. Also fixed file naming issue reported by PCA, and did some other housekeeping.

Permalink 11:05:05 am, by mholmes, 194 words, 126 views   English (CA)
Categories: Activity log, Documentation; Mins. worked: 60

Adding maps to the site

JT provided two new maps for the gallery, so I've added those. I had to refresh myself on the procedure for doing this, so I'll detail it here:

  • Extract the bitmaps from the PDFs (if that's the format they come in) using pdfimages -j [pdffile] [outputprefix].
  • Create meaningful filenames based on repo, id numbers, and year.
  • Copy the full-sized originals into the correct year in [coldesp]/maps] on local drive. These will just be backed up locally.
  • Create a quarter-sized "large" image (max width 5000) in maps_lg.
  • Create a 1000px-wide version in maps_1000.
  • Create a 200px-wide version in maps_200.
  • Create a 100px-wide version in maps_thumb.
  • Create an XML file with the same name as the image file, and a matching @xml:id. It's simplest to model this on an existing file. Save it in xml/maps.
  • Fill out the metadata, and point the facsimile graphic at the right file name, with the right dimensions.
  • Add the XML file to SVN and commit it.
  • Upload the images to home1t, and the XML file into the db.
  • Test to make sure the map shows up in the gallery, and works properly on the site.
Permalink 08:57:36 am, by mholmes, 43 words, 47 views   English (CA)
Categories: Activity log; Mins. worked: 15

Five more documents assigned to PCA

I've assigned the first five 1859 documents transcribed by MM to PCA; the 1858 documents are rather complicated, and the existing 1858 documents need some editing, so it's simpler to work on the 1859 documents for the moment. The Google spreadsheet records the status of each document.

27/03/12

Permalink 02:05:11 pm, by mholmes, 32 words, 124 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 10

Task: renaming of file in SVN and in db

DONE: The transcription of the document 58-01-21_HBC748.rtf is marked up as the file V585MI30, when it should be V585MI02_A. It is already up on the site.

26/03/12

Permalink 04:24:32 pm, by mholmes, 87 words, 53 views   English (CA)
Categories: Activity log; Mins. worked: 60

Linked 26 vessels from Schedules

All vessels referred to in the Schedules which have obvious existing vessel bios have now been linked (including one correction to a typo, "Fartar" instead of "Tartar"). The remaining vessels, for which new vessel bios will be required, are:

Alexandra
Cameleon
Devastation
East Lotherian
John Bright
John Stephenson
John Stevenson
Kingfisher
Nanaimo Packet
Ossifree
Prince of the Seas
Random
Royal Charlie
Scout
Scylla
Severn
Shenandoah
Sutlej

It's likely that the John Stephenson and John Stevenson are the same vessel, and possible that they're actually the John Stevens.

Permalink 03:54:23 pm, by mholmes, 44 words, 59 views   English (CA)
Categories: Activity log; Mins. worked: 30

Changed William Allen xml:id

The William Allen was tagged as "william", which made it confusable with the Brig William ("william_brig"). I've now changed the vessel bio and all references to it to show "william_allen". Also fixed an encoding issue in an 1854 document that I stumbled across.

Permalink 03:28:03 pm, by mholmes, 25 words, 313 views   English (CA)
Categories: Announcements; Mins. worked: 0

Abstracts now added for 1854

Thanks to some excellent work from Petria Arienzale, abstracts have now been added for all 1854 documents. We now have abstracts for all years between 1846 and 1854.

23/03/12

Permalink 10:44:20 am, by mholmes, 20 words, 51 views   English (CA)
Categories: Activity log; Mins. worked: 60

Latest review for PCA

Reviewed PCA's latest work (excellent) and sent comments. Also noticed a couple of issues in other documents and fixed them.

16/03/12

Permalink 01:43:51 pm, by mholmes, 35 words, 116 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 5

Change William Allen id to "william_allen"

DONE 2012-03-26: The xml:id for the William Allen is currently "william", which is very confusing; change it to "william_allen", and change refs to it, so it's not confused with the Brig William.

15/03/12

Permalink 09:46:45 am, by mholmes, 99 words, 121 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 175

Need to check linking of vessels

NOTE: Completed 2012-04-23. Many new vessel entries have resulted from this work, and they will need to be completed when time permits.

Try this, first in /db/coldesp/correspondence, and then in /db/coldesp/:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";

for $r in //name[@type='vessel'][not(@key)]
return $r

The vessel tags inside the correspondence seem mainly to be for vessels which HAVE write-ups; these should simply be correctly linked with @key. The broader set include vessels which may not have bios yet; bios need to be created, and those vessels linked.

Permalink 09:09:52 am, by mholmes, 191 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 30

TNB's report at end of workstudy

This is the state of play on TNB's work as of today:

  • Peripheral bios will all be finished except for one:
    • gordon_t, Captain George T Gordon is the entry.
    • He was captain of the Cormorant, on station in Nisqually in 1846.
    • Gordon Lake was named after him.
    • More research is required to complete his bio.
  • B58 bios: references all switched to Chicago style, and minor edits done up to storks_hk. Old references have just been commented out. Sometimes better references have been added, from a more recent source.
  • A lot of citations for the revised bios still need to be checked in hard copies in the library; sometimes the library will have a different edition, and page numbers may have to be changed.
  • Many, many bios remain to be completed (more than two thirds).
  • Many bios refer to BCDES and could be linked to page-images we have (e.g. the bio for shepherd_j), but we currently lack a system to link from editorial text to a page-image. This needs to be implemented, and BCDES references linked and clarified.
  • Vessels and placenames are up to date to the end of 1861.

13/03/12

Permalink 02:06:30 pm, by mholmes, 409 words, 100 views   English (CA)
Categories: Activity log; Mins. worked: 30

Addressing addressees

There are issues with the search engine relating to both authors and addressees of correspondence. The drop-down lists are generated from distinct values of tags in the header. These tags, inherited from the Waterloo Script, contain plain text, and so the same individual is identified in a variety of different ways. It would be helpful if we could tag these names with ids from the personography, and then build our search engine drop-downs in a more intuitive fashion.

It seems best to start with the addressees, since they constitute a much smaller number (only 89 distinct values, listed below). The simplest approach would be this:

  • Create an XML file listing the referents (or just use the search_lists.xml file).
  • Identify each referent and tag it with the appropriate id from the personography.
  • Create a default personography entry for completely unknown people, uncertain people and missing people.
  • Fix any known oddities (like the square brackets around Carnarvon in one document).
  • Write an identity transform that adds the appropriate id to all files.
  • Update the search form generator so that it pulls appropriate info from the personography based on the distinct values of the name/@key attributes.
  • Update the search form and the search to use the new feature.

Addressees:

  • [Carnarvon]
  • [None]
  • [Unknown; Eliot?]
  • [Unknown]
  • [Various]
  • Adderley (Parliamentary Under-Secretary)
  • Assistant Secretary of State
  • Assistant Under-Secretary
  • Ball (Parliamentary Under-Secretary)
  • Banister
  • Barclay
  • Begbie, Thomas
  • Birch
  • Birch (Assistant Clerk)
  • Blackwood
  • Blackwood (Chief Clerk)
  • Blackwood (Senior Clerk)
  • Blanshard
  • Buckingham
  • Cardwell
  • Carnarvon
  • Carnarvon (Parliamentary Under-Secretary)
  • Chief Clerk
  • Clerk
  • Colonial Office
  • Colonial Secretary
  • Desart (Parliamentary Under-Secretary)
  • Douglas
  • Duke of Argyle
  • Earl Grey
  • Elliot (Assistant Under-Secretary)
  • Elliot (Permanent Under-Secretary)
  • Fortescue (Parliamentary Under-Secretary)
  • Gairdner (Chief Clerk)
  • General Public
  • Gladstone, R.
  • Granville
  • Graves, S.R.
  • Grey
  • Grey, Sir George
  • Hankin
  • Hawes
  • Hawes (Parliamentary Under-Secretary)
  • Head Clerk
  • Herbert (Assistant Under-Secretary)
  • Herbert (Permanent Under-Secretary)
  • Herman Merivale
  • Herman Merivale, Esq.
  • Herman Merivale, Esq. Under Secretary of State for Colonial Affairs
  • Higgins (Private Secretary)
  • Holland (Assistant Under-Secretary)
  • House of Commons
  • Irving (Junior Clerk)
  • Kennedy
  • Kimberley
  • Labouchere
  • Lytton
  • Merivale
  • Merivale (Permanent Under-Secretary)
  • Molesworth
  • Monsell (Parliamentary Under-Secretary)
  • Musgrave
  • Newcastle
  • Officer Administering
  • Pakington
  • Palliser
  • Palmerston, Viscount (Secretary of State, Treasury)
  • Palmerston (Treasury)
  • Parker (Private Secretary)
  • Peel (Parliamentary Under-Secretary)
  • Peel (Under-Secretary)
  • Pelly
  • Prince of Wales
  • Queen Victoria
  • Robinson (Senior Assistant Clerk)
  • Rogers
  • Rogers (Permanent Under-Secretary)
  • Russell
  • Sandford (Assistant Under-Secretary)
  • Secretary of State
  • Secretary of State for Foreign Affairs
  • Seymour
  • Smith
  • Stanley
  • Stanley (Foreign Office)
  • Under-Secretary for the Colonies
  • Under-Secretary of State
  • Under-Secretary of State Foreign Office [sic]
  • Young
Permalink 01:51:28 pm, by mholmes, 55 words, 195 views   English (CA)
Categories: Activity log; Mins. worked: 20

Consistency edits to XML files

Following one of KSW's notes in this post, removed date tags from specific location in 17 files. This is presumably for consistency -- only 17 files had them -- and because I suspect some useful parsing can be done/is being done based on the first date in the text being the date the document was penned.

Permalink 01:19:12 pm, by mholmes, 74 words, 56 views   English (CA)
Categories: Activity log; Mins. worked: 30

Added helpful message for when mentions not found

Items in the indexes have a link under their info popup which enables you to retrieve references to them in the correspondence, but sometimes there are no references (as in the case of peripheral bios, which are referred to in other bios but not in the actual correspondence). Previously, clicking on the "Mentions..." link simply did nothing in these cases, but I've now added a trap for this condition and an appropriate error message.

12/03/12

Permalink 02:48:49 pm, by mholmes, 108 words, 58 views   English (CA)
Categories: Activity log; Mins. worked: 120

Work on documentation and crediting MM

Added appropriate credit to MM for her transcription work, and began the process of pulling documents from Google Docs into the actual repo, which is a bit easier to keep track of. Found one suitable document to get PCA started with full-doc transcription, and created a simple guide to the file/id/naming convention for our collection. Wrote a detailed assignment for PCA and sent it. This process will include a check that our Guidelines document in fact provides enough guidance for a encoding a complete new document. Most likely we will be expanding it in the next week or two as PCA starts to add new transcriptions.

09/03/12

Permalink 10:46:52 am, by mholmes, 47 words, 63 views   English (CA)
Categories: Activity log; Mins. worked: 90

PCA's directed reading report

Reviewed the extensive (and excellent) work completed by PCA, who is now nearly at the end of the 1854 abstracts. Wrote a number of notes for tweaks and fixes, as well as a couple of requests for further research and the transcription of a mysteriously-untranscribed despatch (V547102A).

06/03/12

Permalink 05:40:53 pm, by mholmes, 51 words, 59 views   English (CA)
Categories: Activity log; Mins. worked: 45

Error with vessel info

PCA reported that mentions of the Brig William, wrecked in 1854, are linked to the vessel info for the William Allen, which is not the same ship at all. We dug around to find some references from which to construct a new vessel entry, and she's now going ahead with writing it.

02/03/12

Permalink 01:51:20 pm, by mholmes, 11 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 20

Retrieved stats

Note: Francotoile and the Mysteries projects are not showing any stats.

20/02/12

Permalink 05:03:23 pm, by mholmes, 41 words, 66 views   English (CA)
Categories: Activity log; Mins. worked: 120

Proofing/editing bios

Worked through the bios provided by TB, and made a couple of tweaks; found one person who has been misnamed for years. Also started work on TS's bios; I need to work through a couple of issues directly with him tomorrow.

13/02/12

Permalink 03:08:12 pm, by mholmes, 19 words, 73 views   English (CA)
Categories: Activity log; Mins. worked: 45

Reviewed PCA's first abstracts

PCA has completed abstracts for Jan-Feb 1854. Reviewed them and sent feedback, as well as updating her directed reading report.

02/02/12

Permalink 09:05:55 am, by mholmes, 7 words, 63 views   English (CA)
Categories: Activity log; Mins. worked: 15

Retrieved stats

Pulled down the server stats for January.

31/01/12

Permalink 04:46:28 pm, by mholmes, 28 words, 57 views   English (CA)
Categories: Activity log; Mins. worked: 60

Reviews of three more bios

PA will move on to abstracts for 1854. Meanwhile, I've sent some comments on the other three bios, and in the process added some name markup to an 1852 file.

27/01/12

Permalink 09:42:54 am, by mholmes, 23 words, 73 views   English (CA)
Categories: Activity log; Mins. worked: 45

Review of PA's first biographies

Reviewed the work PA has been doing -- it's excellent -- and wrote some feedback, as well as adding to the weekly report.

26/01/12

Permalink 12:12:17 pm, by mholmes, 47 words, 470 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

OAI records regenerated

Several thousand OAI-PMH records have been regenerated to take account of updates to despatches files and other XML documents in the collection. The process currently takes a long time, so it's only done every few months. OAI metadata records for the collection are now up to date.

Permalink 08:56:38 am, by mholmes, 33 words, 421 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

Colonial Despatches project added to dhcommons.org

The Colonial Despatches project has been added to the DH Commons project list. We are hoping through this to attract collaborators from other institutions who may be interested in researching, writing, and proofreading.

24/01/12

Permalink 04:39:12 pm, by mholmes, 30 words, 91 views   English (CA)
Categories: Activity log; Mins. worked: 60

PA moves into XML

Today PA did her first markup, encoding several of the peripheral bios that she's finished writing, and we posted them on the site, and added her to the credits page.

17/01/12

Permalink 11:17:03 am, by mholmes, 56 words, 76 views   English (CA)
Categories: Activity log; Mins. worked: 15

Fix to peripheral bio entry

TS found an error in a peripheral bio, where the bio for James Johnstone had been created from that for hamilton_t, but the Hamilton info had been left in it; additionally, the name itself was "Johnson" instead of "Johnstone". Fixed this, and also updated a reference to the person in the Johnstone Strait place entry.

16/01/12

Permalink 11:05:47 am, by mholmes, 52 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 15

Changes to BC Geo urls

Changed old BC Geo Names urls from this format:

http://ilmbwww.gov.bc.ca/bcgn-bin/bcg10?name=51611

to this format:

http://apps.gov.bc.ca/pub/bcgnws/names/51611.html

The form seems to have changed, and 16 out of 85 of our references were still using the old URL form. TB noticed the problem.

10/01/12

Permalink 01:44:39 pm, by mholmes, 22 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 60

Setting up PA to work on bios

Gathered some resources for PA, and assigned a list of bios to work on. She's also now familiar with the Linux OS.

Permalink 11:13:41 am, by mholmes, 92 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 120

Fixed a bug

Fixed this bug, which affected the display of catchwords which were between, rather than within, paragraph tags. This involved a change to the CSS, but then I was able to fix nearly 190 instances of <fw> tags that were inside <p> tags and shouldn't have been. Where a catchword now appears after the end of a paragraph, and the next page starts a new paragraph, the <fw type="catchword"> tag should be positioned after the end of the first paragraph, and before the <pb> tag.

09/01/12

Permalink 02:26:37 pm, by mholmes, 44 words, 501 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 90

Welcome to our new directed reading RA

Today the Colonial Despatches Project welcomes its fourth directed reading student, Petria Arienzale. She'll start work tomorrow on some of the biographies, and over the next few weeks she'll be learning about XML, TEI, oXygen and a host of other components of our work.

Permalink 08:33:20 am, by mholmes, 3 words, 54 views   English (CA)
Categories: Activity log; Mins. worked: 10

Retrieved ColDesp stats for all of 2011

Saved from Megapode.

Permalink 08:27:13 am, by mholmes, 287 words, 153 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 15

TO DO List from KSW

This is a list of stuff that KSW reports needs attention (from his Google Doc "Matters for deliberation"):

  • The placement and refreshment process of <name type=”addressee”> tags. See the Search page for details in this. So far, we have been inconsistent with things of this nature. [Martin reports that we may have to address (ha-ha) this later]
  • Who is “EBP”? The name appears in 1859 and beyond. [I have emailed JH with an inquiry, and he does not know. ]
  • How to style measurements, especially kilometres (spell it out?) vs. km vs. Km, and so on. Find this in Chicago. [Chicago uses km, as in “ a 50 km race” -- I added this info to the Guidelines document.]
  • Content to tag, or not, in the <notes>
  • DONE 2012-03-13: Remove <date> tags from inside all <ref co_ref> tags in 1858.[Presumably this was for consistency, since only 17 files had these tags. They were not only in 1858, though.]
  • Tag instances of “the Governor” with Douglas’s tag?
  • Cotteris and Sons, and the like: to tag or not to tag. And, this raises the larger question on tagging companies.
  • To add district-related information to the Victoria profile. For example, some letters refer to the Victoria district. This may take some serious research, mostly in order to determine the historical boundaries of the same. Hmmmm.
  • At some point, we will have to search all the image collections for non-transcribed files. I have created a spreadsheet file called “Transcriptions required” for files I have caught, ad-hoc. However, we should approach this problem systematically. One method would be to work through each image collection and confirm, by despatch number, that each image has a transcribed equivalent.

Colonial Despatches

The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at http://bcgenesis.uvic.ca, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at http://revision.tapor.uvic.ca/svn/coldesp/.

Reports

Categories

2012
 << Current>>
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Nov Dec

XML Feeds