Archives for: 2011

20/12/11

Permalink 11:19:13 am, by kim, 214 words, 194 views   English (CA)
Categories: Activity log, Documentation; Mins. worked: 25

Duplicate documents across collections

This concerns documents that appear in both letter-book and original form, and how to handle this crossover.

For example, we found a dozen or so documents in 1859 that are part of the 398/1 (BC series) and RG7G8C (VI series) collections, respectively. We decided that it was best to show both, but alert the reader to the copy or original, from each respective document.

So, in the RG7G8C version of this file, we added this note:

<note xml:id="B597018_1">Please note that this document exists as a <ref type="doc" cRef="V597018.scx">letter-book copy</ref>, as part of the British Columbia collection.</note>

And in this document, the 398/1 version, we added this note:

<note xml:id="V597011_1">The original form of this correspondence <ref type="doc" cRef="B597011.scx">can be viewed here</ref>. Please note that the original was marked initially as part of the Vancouver Island collection, and changed thereafter, presumably after receipt, to the British Columbia collection.</note>

For now, we have worked through most of the 1859 collection for duplicates. We will have to check in the CO410 collection for the same issue, and do the same for all applicable years.

15/12/11

Permalink 10:16:53 am, by kim, 164 words, 240 views   English (CA)
Categories: Tasks; Mins. worked: 0

Dangling catchwords

EDIT: This is fixed as of 2012-01-10.

This task has to do with catchwords and how to position them properly on the website—flush to the right margin of the body-text—when they apear after the final paragraph of a given page, just prior to the page-break.

For now, we have wrapped the FW tag in a P tag, as in the following example:

[...] with a view to their protection and civilization.</p>

<p><fw type="catchword" rend="text-align: right;">I</fw></p><pb n="rg7_g8c_08/rg7_g8c_08_00043v.jpg"/>

<p>I am glad to find that your sentiments respecting [...]

The above is a workaround, as it could be argued that the catchword does not, in itself, represent a paragraph. So, we will need to, eventually, develop a way to display these dangling catchwords appropriately and, in the process, remove the P tags we use now.

13/12/11

Permalink 10:17:57 am, by mholmes, 41 words, 467 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

Complete set of RG7 G8C Vol 9 page images added to the Colonial Despatches collection

The complete collection of 712 page images for RG7 G8C Vol 9 (in three different sizes) have been added to the collection. These cover the second part of 1860-61 BC Despatches from London. We originally had only the first 25 of these images.

06/12/11

Permalink 03:10:28 pm, by kim, 57 words, 194 views   English (CA)
Categories: Tasks; Mins. worked: 5

Remove B587055A.xml from the coldesp collection

This is a call to remove B587055A.xml from the coldesp collection, as it is a duplicate of B587056A.xml.

B587055A.xml is incomplete in its transcription of a private letter found in the 398/1 image collection. B587056A.xml, however, provides a complete transcription and, moreover, it follows the correct sequence within the 398/1 image-collection.

Permalink 01:27:45 pm, by mholmes, 22 words, 393 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

RG7 G8C Vol 8 page images added to the Colonial Despatches collection

361 new images (in three different sizes) have been added to the collection. These cover the second part of BC 1859 Despatches from London.

02/12/11

Permalink 08:41:45 am, by mholmes, 10 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 10

Retrieved Coldesp stats

Stats show a noticeable increase in usage through the semester.

30/11/11

Permalink 10:28:58 am, by mholmes, 20 words, 360 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

RG7 G8C Vol 7 page images added to the Colonial Despatches collection

Over 400 new images (in three different sizes) have been added to the collection. These cover the BC 1859 Despatches from London.

28/11/11

Permalink 04:45:49 pm, by mholmes, 22 words, 58 views   English (CA)
Categories: Activity log; Mins. worked: 30

RG7 G8C Vol 6 and 9 page images added to the Colonial Despatches collection

Over 600 new images (in three different sizes) have been added to the collection. These cover the BC 1858 and 1860-61 Despatches from London.

22/11/11

Permalink 11:26:35 am, by mholmes, 21 words, 375 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 15

RG7 G8C Vol 3 page images added to the Colonial Despatches collection

439 new images (in three different sizes) have been added to the collection. These cover the Vancouver Island 1862-63 Despatches from London.

21/11/11

Permalink 04:23:19 pm, by mholmes, 43 words, 87 views   English (CA)
Categories: Activity log; Mins. worked: 15

Updates to credits and press coverage

KSW pointed out that we had a little press coverage in the TC on Nov 20, so I've added that to the relevant page on the site. I've also added a couple of the new folks working on the site to the Credits page.

16/11/11

Permalink 02:43:25 pm, by mholmes, 148 words, 90 views   English (CA)
Categories: Activity log; Mins. worked: 20

Bug fixed: queries exceeding output size limit

The default output size limit for an XQuery is 10000. When requesting a list of "Mentions of this place in the documents" when the place is Vancouver Island, an error was occurring because this limit was exceeded. I've fixed this in two ways:

  1. First, as a general principle, we want to allow larger result sets across all XQuery operations, so I modified the following line in WEB-INF/conf.xml to change 10000 to 20000:
    <watchdog output-size-limit="20000" query-timeout="-1"/>
    
    That change apparently does not take effect until the server is restarted. However, I didn't want or need to restart the server to fix the bug right now, because of #2.
  2. You can specify the output size limit inside an XQuery file, so I've included this in the getRefs.xq file to solve the immediate problem:
    declare option exist:output-size-limit "20000";
    

If similar problems show up in future, we can make further increases.

15/11/11

Permalink 10:28:57 am, by mholmes, 21 words, 276 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

RG7 G8C Vol 2 page images added to the Colonial Despatches collection

719 new images (in three different sizes) have been added to the collection. These cover the Vancouver Island 1859-61 Despatches from London.

02/11/11

Permalink 08:48:42 am, by mholmes, 2 words, 114 views   English (CA)
Categories: Activity log; Mins. worked: 15

Got stats for ytd October

...from Megapode.

01/11/11

Permalink 03:44:26 pm, by mholmes, 166 words, 112 views   English (CA)
Categories: Activity log; Mins. worked: 30

Handy XQuery for finding possibly-bad page links

These two blocks of XQuery will search for page-image links in the header <biblScope> and in <pb> tags and report any that don't match the expected pattern. That doesn't mean they're bad, just that they need checking.

header <biblScope>s:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";

for $b in //biblScope[@type="startPageImage"]
let $bits := tokenize($b/@facs, "/")
where not(starts-with($bits[2], $bits[1]))
or not(matches($b/@facs, '((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}/((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}_[0-9]{5,5}[rv].jpg'))
return (xs:string($b/ancestor::TEI/@xml:id), $b)

<pb> tags in the body:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";

for $pb in //pb[@n]
let $bits := tokenize($pb/@n, "/")
where not(starts-with($bits[2], $bits[1])) 
or not(matches($pb/@n, '((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}/((co)|(rg7))_((g8c)|([0-9]{1,3}))_[0-9]{2,2}_[0-9]{5,5}[rv].jpg'))
return (xs:string($pb/ancestor::TEI/@xml:id), $pb)

26/10/11

Permalink 11:01:19 am, by kim, 86 words, 176 views   English (CA)
Categories: Documentation; Mins. worked: 0

How we handle incomplete entries for places, people, and vessels

A detailed write-up of the information below has been added to the Guidelines document. For now, the following examples, where the attributes are emphasized, should suffice:

  • For places: <placeName type="incomplete">Point Aitch Bee Cee</placeName>
  • For people:<persName type="incomplete"> <surname>Andrews</surname>, <forename> J.</forename><forename> A.</forename></persName>
  • For vessels: <name subtype="incomplete" type="vessel" key="archer">Archer</name>

25/10/11

Permalink 02:38:17 pm, by mholmes, 80 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 15

Stats generation

Quick-and-dirty XQuery to generate stats for 1860:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";

let $pbs := count(//TEI[substring(@xml:id, 2, 2) = '60']//pb[@n]),
$biblScopes := count(//TEI[substring(@xml:id, 2, 2) = '60']//biblScope[@type='startPageImage']),
$tot := ($pbs + $biblScopes)
return concat('Page-break tags: ', $pbs, '; biblScopes: ', $biblScopes, '; total: ', $tot)
xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";

let $names := count(//TEI[substring(@xml:id, 2, 2) = '60']//name[@key])
return $names
Permalink 12:51:12 pm, by mholmes, 24 words, 54 views   English (CA)
Categories: Activity log; Mins. worked: 30

Updated stats page

Made some updates to the stats page (stats.htm) so that it displays more useful data about the state of completion of the documents.

24/10/11

Permalink 11:43:44 am, by kim, 134 words, 175 views   English (CA)
Categories: Documentation; Mins. worked: 0

A note on the Peripheral Vessels file

A "peripheral_vessels.xml" file was created to house vessels mentioned in files other than the despatches. For example, in Captain Cook's biography, we might mention his ship, Discovery, which does not appear in the despatches, at least not in the content transcribed currently.

As we discussed as a team, it seems odd that the online reader should encounter some vessels tagged and others not. After all, readers do not know which vessels occur in the letters and which do not. The peripheral-vessels file solves cures this potential for confusion.

Lastly, should a vessel that appears in the peripheral-vessels file one day be discovered elsewhere in the future, say, if the enclosures are eventually transcribed, then we would move the respective vessel entry over to the "vessels.xml" file, a simple copy/paste operation.

13/10/11

Permalink 01:17:49 pm, by mholmes, 21 words, 225 views   English (CA)
Categories: Announcements; Mins. worked: 20

CO 305 vol 16 page images added to the Colonial Despatches collection

144 new images (in three different sizes) have been added to the collection. These cover the Public Accounts for Vancouver Island, 1857-1860.

27/09/11

Permalink 05:38:14 pm, by mholmes, 45 words, 64 views   English (CA)
Categories: Activity log; Mins. worked: 15

RA didn't commit -- had to sudo it

KSW noticed that TS had forgotten to commit his changes to SVN, but I was able to log into his machine as hcmc and sudo svn commit. Have to make sure that didn't result in any permissions changes that would prevent future updates or commits.

Permalink 01:25:01 pm, by kim, 97 words, 179 views   English (CA)
Categories: Documentation; Mins. worked: 0

SVN tricks and tips

This page will list our SVN conundrums and how we solved them! And, should this page miss something, check this website.

  • When we need to compare, that is, find the difference between two versions of the same file, or files, I suppose.Use this:
    svn diff -r [version number]:[version number]
    As in this example:
    svn diff -r 460:481 B60001.xml
    This was used to look at two versions of the same file: B60001.xml from version 460 and version 481. The SVN report details, with little plus and minus signs, to indicate lies and content added or removed, respectively.

19/09/11

Permalink 09:22:58 am, by mholmes, 24 words, 61 views   English (CA)
Categories: Activity log; Mins. worked: 30

KSW has upload privileges into the db

Set up KSW with upload privileges over most of the data areas of the db, so he can refresh files whenever he needs to.

16/09/11

Permalink 02:06:02 pm, by mholmes, 32 words, 52 views   English (CA)
Categories: Activity log; Mins. worked: 30

Welcome to TS

TS joined the team today as a workstudy. I've set him up on Onion, and we've gone through the procedures around use of SVN and oXygen. KSW will take over on Tuesday.

02/09/11

Permalink 09:55:41 am, by mholmes, 3 words, 403 views   English (CA)
Categories: Activity log; Mins. worked: 15

Gathered website stats for ytd

As in title.

31/08/11

Permalink 02:11:14 pm, by kim, 37 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 5

Coldesp Image Inventory

I have completed an image inventory for all the myriad collections we have on file, in a few locations. This can be viewed here as a webpage, which is updated automatically whenever any future changes are made.

30/08/11

Permalink 11:10:21 am, by mholmes, 44 words, 58 views   English (CA)
Categories: Activity log; Mins. worked: 15

CO 305 vol 17: first few images processed and posted

One of our existing documents from 1860 was found in CO 305 vol 17 (no other 1860 documents are in there), so KSW has processed the first 20-odd pages of 305/17, and we've added them to the collection, but the rest remain to be done when 1861 is being processed.

29/08/11

Permalink 01:34:33 pm, by mholmes, 30 words, 271 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

CO 398 vol 1 page images added to the Colonial Despatches collection

526 new images (in three different sizes) have been added to the collection. These cover the Entrybooks of Correspondence for BC, 1858-1861. They are currently being linked into the transcription documents.

08/08/11

Permalink 10:35:55 am, by mholmes, 69 words, 59 views   English (CA)
Categories: Activity log; Mins. worked: 15

Backed up JPEGs to Rutabaga (updated 2013-05-22)

2382 new images to go up, and I realized I hadn't previously documented the changed paths resulting from the change from the old Rutabaga to the new DS hardware. Here's the command now: log into nfs.hcmc.uvic.ca, go to /home1t/coldesp/www, and run:

rsync --stats --recursive --times --delete --verbose -e ssh jpg_scans/ "mholmes@rutabaga.hcmc.uvic.ca:/volume1/homes/mholmes/Colonial_Despatches/www/jpg_scans/"
Permalink 10:28:55 am, by mholmes, 99 words, 59 views   English (CA)
Categories: Activity log; Mins. worked: 60

CO 60 Nos. 7, 8 and 9 added to the manuscript image collection

Three new sets of page-images have now been added to the collection:

  • CO 60, Vol 7 (BC 1860 Despatches January to July)
  • CO 60, Vol 8 (BC 1860 Despatches August to December)
  • CO 60, Vol 9 (BC 1860 Public Offices and Miscellaneous)

In all, 2382 new images have been added (at three different sizes, as usual). Substantial updates have also been made to the document markup for 1860, and additions have been made to the biographies and the place database. (All this good work is of course Kim's; my role is just to integrate all the changes into the database, make backups, update the information pages on the site, etc.)

Permalink 10:24:32 am, by mholmes, 5 words, 45 views   English (CA)
Categories: Activity log; Mins. worked: 15

Got Coldesp stats

Retrieved ytd stats from Megapode.

04/07/11

Permalink 08:03:47 am, by mholmes, 11 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 10

Got Coldesp stats

Stats for the first half of the year retrieved from megapode.

28/06/11

Permalink 01:42:26 pm, by mholmes, 24 words, 86 views   English (CA)
Categories: Activity log; Mins. worked: 15

Backed up JPEGs to Rutabaga

Now the new Rutabaga is online, I was able to update the backups of the jpg_scans image tree with an almighty rsync operation.

27/06/11

Permalink 04:44:49 pm, by mholmes, 27 words, 473 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

CO Series 305 vol 15 page images now posted to the site

A further 1348 page-images have been added to the manuscript image browser, covering Vancouver Island Public Offices and Miscellaneous correspondence, 1860. Transcriptions are now being linked into these images.

17/06/11

Permalink 12:27:04 pm, by mholmes, 46 words, 75 views   English (CA)
Categories: Activity log; Mins. worked: 60

OAI metadata regenerated and re-uploaded

Yesterday I generated all the OAI metadata using my local copy of the main eXist 1.1.1 application, forgetting that its XQuery functionality is limited compared with the newer one. The resulting records were missing lots of lookup data from the ographies, so I've regenerated and re-uploaded them.

Permalink 12:21:49 pm, by mholmes, 25 words, 469 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 60

OAI Metadata interface explained

A page has been added to the site explaining the OAI-PMH metadata interface and how it works, with links to example queries returning XML responses.

16/06/11

Permalink 05:30:17 pm, by mholmes, 81 words, 94 views   English (CA)
Categories: Activity log; Mins. worked: 90

Updates to all XML data and OAI

My local copy of ColDesp was out of sync and out of date, so I've updated it. This took a bit of diffing to figure out which files had been removed from the set, along with their corresponding OAI files. Then I regenerated all the OAI records and uploaded them into the server db. Finally, I backed everything up, and took a complete local copy of Tomcat + eXist to copy to my laptop, for the conference, in case of connectivity issues.

Permalink 04:15:11 pm, by mholmes, 124 words, 94 views   English (CA)
Categories: Activity log; Mins. worked: 120

Fixes for JavaScript popup functionality

The index pages (people, places, vessels) have some peculiar referencing complexities, in that any item on the page can have links which reference other items on the same page, or items which must be retrieved by AJAX. Previously, in the case of a local item, the JavaScript was moving the content from its normal place on the page into the popup, and then in theory putting it back again, but that actually resulted in a blank space in some situations (such as when the popup was closed, rather than being filled with new content). I've now rewritten the system so that it does what the Mariage site does: it clones the content of the item into the popup, and deletes it when it's done.

14/06/11

Permalink 03:00:35 pm, by kim, 48 words, 199 views   English (CA)
Categories: Tasks; Mins. worked: 0

BUG: vanishing place names

The problem: on the places index page, when you click on a place within a places write up, the clicked-place vanishes from the places index. Rather than cloning the content, the link appears to relocate it. This may be happening in the bios and vessels list as well.

08/06/11

Permalink 05:08:07 pm, by mholmes, 26 words, 483 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 15

CO Series 305 vol 14 page images now posted to the site

A further 941 page-images have been added to the manuscript image browser, covering Despatches from Vancouver Island to London, 1860. Transcriptions are now being linked into these images.

02/06/11

Permalink 10:59:43 am, by mholmes, 5 words, 1295 views   English (CA)
Categories: Activity log; Mins. worked: 15

Retrieved stats for May

YTD stats retrieved and stashed.

27/05/11

Permalink 10:08:20 am, by mholmes, 65 words, 74 views   English (CA)
Categories: Activity log; Mins. worked: 60

Linear paths now working in Google maps

I've tweaked the XQuery which produces the KML file so that it can recognize when a <place> has <location type="path">, and in that case it produces a LineString element instead of a <Polygon> or a <Point>, and it doesn't supply the closing georef which repeats the first, which is what we use to close a polygon.

26/05/11

Permalink 11:54:19 am, by kim, 59 words, 222 views   English (CA)
Categories: Tasks, Documentation; Mins. worked: 5

Geo tags: paths and lines, not polygons and bounding boxes

We use paths for rivers. To make a path, or line, appear on Google Maps you need to override the code that automatically coverts multiple (not single, of course) <geo> coordinates to polygons.

To do this, you need to add a type="path" attribute to the <location> tag, as follows: <location type="path">.

16/05/11

Permalink 05:22:01 pm, by mholmes, 12 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 60

Finished my OAI poster

Although it's a bit ugly. No time to get too fancy, unfortunately.

13/05/11

Permalink 02:26:31 pm, by mholmes, 10 words, 68 views   English (CA)
Categories: Activity log; Mins. worked: 90

More work on the posters

Did most of a diagram of the OAI process today.

12/05/11

Permalink 11:02:21 am, by mholmes, 27 words, 1702 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

CO Series 6 vol 32 page images now posted to the site

A further 1298 page-images have been added to the manuscript image browser, covering British North America 1860 Public Offices and Individuals. Transcriptions are now being linked into these images.

10/05/11

Permalink 03:52:47 pm, by mholmes, 52 words, 75 views   English (CA)
Categories: Activity log; Mins. worked: 120

More work on the poster presentation

Today's progress, with help from KSW:

  • Made poster templates for portrait and landscape at 18x24.
  • Changed out the page-images for the "What's in a Despatch" poster, because the original was actually an admiralty letter rather than a despatch. Re-labelled the new document.
  • Resized the enumeration poster.
  • Planned out the OAI generation poster.

09/05/11

Permalink 05:05:37 pm, by mholmes, 48 words, 1843 views   English (CA)
Categories: Activity log; Mins. worked: 45

Started work on the poster presentation

Retrieved the original launch documents and fixed missing images caused by hard-coded paths to raster graphics in the SVG. Updated the "Despatches by numbers" one. Over the next couple of days, we'll reformat these for an appropriate size, make an OAI one, and get them printed and laminated.

04/05/11

Permalink 05:04:45 pm, by mholmes, 32 words, 79 views   English (CA)
Categories: Activity log; Mins. worked: 20

Got RM set up and put him on the credits

Got RM set up with SVN instructions and working on the places.xml file. He's now made some edits, so I've updated the db and added him to the project team page.

04/04/11

Permalink 08:03:33 am, by mholmes, 2 words, 83 views   English (CA)
Categories: Activity log; Mins. worked: 10

Gathered stats for March

Nothing unusual.

31/03/11

Permalink 09:20:55 am, by kim, 70 words, 183 views   English (CA)
Categories: Documentation; Mins. worked: 0

SVN errors and solutions

This lists some of the errors that we have encountered, and their respective solutions!

Upon my morning svn update I received this error:

kim@dandelion:~/Desktop/coldesp_xml/xml$ svn update
svn: warning: cannot set LC_CTYPE locale
svn: warning: environment variable LANG is en_CA
svn: warning: please check that your locale name is correct

SOLUTION: type this into terminal

export LC_ALL=C

I found this info here.

30/03/11

Permalink 12:08:09 pm, by kim, 44 words, 200 views   English (CA)
Categories: Documentation; Mins. worked: 0

Regular expressions for Oxygen: Lookahead and Lookbehind

Use this to find phrases excluding certain words. For example, if you want to find Hudson's Bay, but without "Company," "House," "Territory," and so on:

  • Hudson's Bay (?![Cc]o)(?![Hh]ouse)(?![Tt]er)

This info taken from here: http://www.regular-expressions.info/lookaround.html

22/03/11

Permalink 11:27:00 am, by mholmes, 46 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 60

Regenerated all OAI records

Catching up with recent changes -- set my local version of the app to regenerate all the OAI records from the latest XML files while out at the dentist, then copied them down from there, committed them to SVN, and uploaded them into the live app.

10/03/11

Permalink 03:31:57 pm, by kim, 29 words, 84 views   English (CA)
Categories: Activity log; Mins. worked: 3

Added placeholder entries for placenames found in 1859

Completed the addition of new placenames from 1859, of which, there are 59 additions. Hopefully, Theo and Shaun can get through their respective tasks in time to help to complete them.

09/03/11

Permalink 02:13:36 pm, by mholmes, 271 words, 56 views   English (CA)
Categories: Activity log; Mins. worked: 120

Reworking of KML for Places entries

Following yesterday's post, I've rewritten the XQuery that handles generating the KML files that are passed to Google Maps, along with tweaks to the XSLT that creates the map links in place data display on the site:

  • Both types of location (point and polygon) are now handled by the same getKml.xq file.
  • The file builds either a <LinearRing> (for a polygon) or a <Point> (for a point) tag into the KML.
  • Points have no "natural" zoom level, so they were zooming right in to street level by default. We first attempted to take care of this by setting an altitude of 20000 metres, but this works only for Google Earth, not Google Maps, so we did some experimentation and came up with a zoom level of 8 as the best overall setting, and we now specify that as part of the GET array in the URL for Google Maps (in the case of a point).
  • The content of <place>/<desc> tags is now output as plain text twice in the output KML: once as a <Snippet> element (which means it shows up on the left of the map, next to the placename), and once again as a <description> element (which means it shows up inside the "speech bubble"-style popup when you click on the location's pushpin marker.
  • We decided against trying to style the description content or embellish it with links, because that would require a second level of processing (XSLT), slowing down the operation, and because most of the links inside placename entries are internal AJAX calls anyway.

08/03/11

Permalink 10:48:45 am, by mholmes, 145 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 30

KML file handling for Google Maps

Right now, we're handling the link between the places entries and Google maps in this way:

  • If there's a single point (one <geo> element) for the location, we're simply passing the coordinates and a placename to Google Maps, which then displays the point.
  • If there are multiple points (a polygon), we pass Google Maps the URL of a pipeline that constructs a KML file, which results in a slightly richer display (but not rich enough).

What we need to do:

  • Rewrite the XQuery so that it can distinguish between points and polygons in the source, and in the case of a point, construct a Point element instead of a LinearRing.
  • Change the XSLT handling place entries so that the KML option is used in all cases.
  • Look at enriching the KML with some extra information, using the description tag or some other mechanism.

07/03/11

Permalink 02:32:32 pm, by kim, 109 words, 54 views   English (CA)
Categories: Activity log; Mins. worked: 5

Biographies file fixes

The biographies files have been tuned up. And, I have added an 1859 biograpies file to the "Bios" folder. I have yet to add the list of new names from 1859, as I await Theo to finish his additions from 1858, as the names I found in '59 may have occurred fist in '58.

In all the current biography files, I have deleted duplicate entries and formatted correctly several of the entries, and where required, updated the necessary tags in the despatches. Along the way, this process has revealed ways in which we can standardize further the way we handle biography entries. I will post these amended standards on the Guidelines document.

Permalink 02:24:38 pm, by kim, 60 words, 56 views   English (CA)
Categories: Activity log; Mins. worked: 5

1859 update

I have completed my file-by-file pass of all the 1859 files, adding place, date, people, vessel, and First Nations tags.

Next, I will add the new-found names, of people places and vessels, to their respective content-files. I predict this to take roughly two days, and then we will require entries to be written for, at least, the place names and vessels.

02/03/11

Permalink 08:55:57 am, by mholmes, 3 words, 53 views   English (CA)
Categories: Activity log; Mins. worked: 10

Got the latest stats

Retrieved from Urchin.

23/02/11

Permalink 03:27:25 pm, by mholmes, 45 words, 61 views   English (CA)
Categories: Activity log; Mins. worked: 40

Uploaded fixed OAI files

It seems I can only generate OAI file correctly using my recent Cocoon/eXist build locally. I really need to port the project over to that, but it might be complicated; it certainly requires a rebuild of my xqsearchutils library, which is currently generating errors...

21/02/11

Permalink 05:04:02 pm, by mholmes, 181 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 120

Looking through incoming transcriptions

Began the process of looking through M's incoming transcriptions, of which there are 156 files. The principal difficulties initially are these:

  • Some (not many so far) consist of transcriptions of enclosures for existing documents; these should just be merged into the existing XML files, so are not problematic at all.
  • Others will need to acquire filenames/ids. This means slotting them in amongst existing documents in the database, which means some numbering with appended A, B, C etc., following this plan.
  • In order to generate appropriate ids, they need to be sorted first by type (whether they're 580, 585, 586, 587 etc.) and then by date. I've made a start on the latter, by expanding the beginning of the existing filename to include a full yy-mm-dd date (received date), and I'll then look at them again to group them into folders by type.
  • Once that's done, I can give each file its actual target name in the db, and then after that it should be practical to start doing some markup, starting with the shorter files, in the knowledge that the later ones are already named.

16/02/11

Permalink 09:22:39 am, by mholmes, 19 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 15

SM has access to SVN

SM is working on abstracts, and we've now set him up with access to the SVN. He's working on 1852.

15/02/11

Permalink 02:38:19 pm, by mholmes, 16 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 45

Stats for Coldesp report

Compiled some stats for govlet.ca and coldesp for the ACDP performance indicators report, for CP.

14/02/11

Permalink 03:45:25 pm, by mholmes, 54 words, 71 views   English (CA)
Categories: Activity log; Mins. worked: 60

Admin and reporting

Spent an hour looking at some sample scans of maps from HBC, and discussing with CP by email what we might order in the way of digital images. Also some requirements for reporting for the ACDP have come down the pipe, so I've asked sysadmin to add govlet.ca to my Urchin stats set.

10/02/11

Permalink 04:16:33 pm, by kim, 171 words, 61 views   English (CA)
Categories: Activity log; Mins. worked: 5

KSW's Week-end Update

Here is where we sit for the moment:

I have to have a last file-by-file pass at the 1859 files to scan for missed people, place, vessel, and mentions of Indigenous People. Much of this has already been done with find/replace, for example, with the common names and places, but I need to do a final pass to catch the stragglers.

Theo continues to add place-name tags to files from the 1858 collection, as this was missed in the last round. This is a welcome change for the poor man, as I had him parsing images for eons. I suspect that by the time his time with us is completed (as an RA), he will have completed this task, perhaps more.

Shaun has turned to the writing of abstracts, picking up where we had left off: 1852. As he is here as part of a Directed Reading, through the work of Susan Doyle in the Pro. Writing Department, he will be leaving us when his semester ends, around the first week of April.

Permalink 04:04:05 pm, by kim, 32 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 5

Date tags: 1859

I have completed the date tags for 1859. There may be the odd one missing, but I will catch them when I scan through each file for new place, people, and vessel names.

09/02/11

Permalink 02:12:22 pm, by mholmes, 38 words, 70 views   English (CA)
Categories: Activity log; Mins. worked: 20

Abbreviation markup: Dept for Department

120 changes; these are the regexps:

(?<!abbr>)([Dd])ep(<hi rend="[^"]+super+[^"]+">t\.?</hi>\.?) 
<choice><abbr>$1$2</abbr><expan>$1epartment</expan></choice> 

07/02/11

Permalink 05:26:53 pm, by mholmes, 49 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 60

DB updates and OAI records regenerated

I'm not going to do this often, but the OAI records were very out of date, so I've regenerated them all locally and uploaded them back into the db. Takes a long time -- I wonder if there's a way to script it so it could run unattended somehow...

01/02/11

Permalink 03:05:44 pm, by mholmes, 5 words, 55 views   English (CA)
Categories: Activity log; Mins. worked: 15

Got the latest webstats

Grabbed the stats for January.

Permalink 01:30:24 pm, by mholmes, 27 words, 1701 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

CO Series 60 vol 6 page images now posted to the site

A further 1422 page-images have been added to the manuscript image browser, covering British Columbia 1859 Public Offices Part 2, and Miscellaneous. Transcriptions are now being linked into these images.

27/01/11

Permalink 01:50:11 pm, by mholmes, 160 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 60

Getting lists of name variants

To assist KSW with automating some of the name markup, I've been generating lists of distinct values for name variants, using this code:

xquery version "1.0";

declare namespace  xdb="http://exist-db.org/xquery/xmldb";
declare namespace util="http://exist-db.org/xquery/util";
declare namespace f="http://exist-db.org/f-functions";
declare namespace tei="http://www.tei-c.org/ns/1.0";
(:declare namespace fn="http://www.w3.org/2005/xpath-functions";:)

declare function f:getContents($id as xs:string) as element()*
{
	for $d in distinct-values(collection('/db/coldesp/correspondence/')//tei:name[not(@type)][@key = $id])
		return 
<name>{$d}</name>
};

<people>
{for $id in distinct-values(collection('/db/coldesp/bios/')//tei:person/@xml:id)
return 
<person xml:id="{$id}">
{f:getContents($id)}
</person>}
</people>

Throws up some interesting things that look like they might be typos, as well as many names that don't seem to have any mentions in the text. I'm investigating.

25/01/11

Permalink 09:45:09 am, by mholmes, 39 words, 67 views   English (CA)
Categories: Activity log; Mins. worked: 30

OAI records updated again

Added in the First Nations information where it's been tagged as <dc:subject> tags, and regenerated the records. This goes pretty quickly in the 1.4.1 version of the db; I should definitely move forward with the port asap.

24/01/11

Permalink 05:12:46 pm, by mholmes, 82 words, 87 views   English (CA)
Categories: Activity log; Mins. worked: 120

OAI records updated and two site tweaks

Regenerated all the OAI records using my local version of the site based on eXist 1.4.1 (it seems to be faster, even without indexes properly configured), and also added a couple of links to the site as requested by CP, one to the Govlet site and one to the Libraries Early BC Maps page.

Generating these records takes a good while. I keep hitting little buglets in the XQuery which require me to restart the process. Hopefully we're pretty solid at this point.

Permalink 11:06:57 am, by mholmes, 134 words, 76 views   English (CA)
Categories: Activity log; Mins. worked: 60

Workaround for XQuery bug

Struggling with the strange behaviour I was seeing, where a function would execute correctly if called directly, but not if called from a for loop, I discovered two things: the same problem still exists in eXist 1.4.1, but there I see an actual error to the effect that the context is missing for a node; and I can eliminate the problem by rewriting some XPath inside the query. This is the XPath that was causing the problem:

for $n in (distinct-values($doc//@key[parent::tei:name[not(@type)]]))

Admittedly it's a bit perverse. If it's rewritten like this:

for $n in (distinct-values($doc//tei:name[not(@type)]/@key))

then the query works even when executed inside a loop. This means I can now generate all the complete OAI records that JD would like to see.

Permalink 09:37:18 am, by kim, 49 words, 72 views   English (CA)
Categories: Activity log; Mins. worked: 5

Page Break (PB) Tags Added to More 1859 Files

I have added Page-Break tags to the CO 60/5 files. All that remains for the remaining 1859 files is CO 60/6, the images for which Theo will be done with by week's end. In the meantime, I can begin tagging people, place, ship, and mentions of First Nations in the 1859 files, generally.

21/01/11

Permalink 02:27:09 pm, by mholmes, 125 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 120

More struggling with OAI

I've hit a snag with the OAI stuff which is almost certainly an eXist bug: when I generate an OAI record individually, passing an xml:id to the function that does it, then all the people, places etc. are included, but when I try to generate records in a loop by passing all the ids in, then that information is not included. I've tried configuring extra indexes and all sorts of other workarounds but it seems insuperable, and I'm certainly not going to generate 7,143 records one at a time. I'm now forced to look at updating to the new build of eXist/Cocoon to see if the bug is present there. If it's not, then that's a solid reason for doing the migration right now.

20/01/11

Permalink 03:36:14 pm, by mholmes, 28 words, 1607 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 20

CO Series 6 vol 30 page images now posted to the site

A further 1250 page-images have been added to the manuscript image browser, covering British North America 1859 Public Offices/Hudson's Bay Company. Transcriptions are now being linked into these images.

19/01/11

Permalink 03:44:50 pm, by mholmes, 28 words, 348 views   English (CA)
Categories: Activity log, Announcements; Mins. worked: 30

CO Series 60 vol 5 page images now posted to the site

A further 1108 page-images have been added to the manuscript image browser, covering BC 1859 Despatches to London and Public Offices Part 1. Transcriptions are now being linked into these images.

Permalink 11:04:17 am, by mholmes, 194 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 120

OAI metadata bugs fixed

  • Fixed the various bugs in the oai_update.xq file, so that the <record> elements have the right schema and namespace data. I also realized that the <dc:identifier> element should contain a full working URL to the transcription file, so now it does.
  • Fixed the bugs in oai.xq so that the @xml:id attribute no longer appears on the <record> element when outputting it in response to a request.
  • Deleted all my oai record files and re-generated them. This is done (at the moment) by starting up my local copy of the db, starting the webstart client, and then copy/pasting the oai_update.xq file into the code area and running it. There are more elegant ways, but doing it through a web interface would require building authentication into the app, or loosening up the permissions for the guest user more than I'd like.

I'm now uploading the records to the main db, and I'll do the requisite testing on that using the online tool linked in my prior post. Once that's done, I'll consider it functional and ready for testing by the library folks.

18/01/11

Permalink 03:51:08 pm, by mholmes, 127 words, 65 views   English (CA)
Categories: Activity log; Mins. worked: 240

OAI metadata system almost working

I now have the paging and resumptionToken functionality working, and I've started testing the repository output using this online tool. Mostly it's working fine, but I have two issues with validation of the output of ListRecords -- I'm storing the id of the document in an xml:id attribute in the record element, and that's not allowed; I have a broken xmlns:xsi attribute in some of the records caused by a now-fixed bug in the oai_update.xq; and the xsl:schemaLocation attribute seems not to be allowed (probably caused by the foregoing issue). I should be able to get these fixed by tomorrow; I'll have to regenerate all the OAI files, though, and then export/import them from my local machine to the Pear instance.

17/01/11

Permalink 05:01:22 pm, by mholmes, 59 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 120

OAI progress and general catchup

Updated changed files in the db; added the OAI files to SVN; and continued work on the OAI interface. I now have everything working except for the paging out of results with the resumption token, which I hope to complete tomorrow. Then I'll need to add better indexing for the OAI files, and we should be ready to go.

Permalink 11:07:18 am, by kim, 110 words, 76 views   English (CA)
Categories: Activity log; Mins. worked: 5

Shaun on board to edit and write

Shaun Macpherson will be joining the project this week, in an editing and copyediting capacity.

His work for the project is part of a directed study course that he and Susan Doyle have constructed, with input from Martin and I, as part of the English Department's Professional Writing program. This is an unpaid position, and Shaun is working for course credit. Shaun will work/learn for 6 hours per week under my guidance and direction, until sometime in April--roughly, the semester's end.

We look forward to his contributions! To start, he will incorporate Frank Leonard's latest batch of biographies from 1848; he will then move on to write, edit, and code abstracts.

10/01/11

Permalink 03:57:46 pm, by kim, 38 words, 64 views   English (CA)
Categories: Activity log; Mins. worked: 5

1859 Image Processing: Update

Just a quick update to say that Theo is processing images for the CO 60/6 collection, of which there are 117 files in 1859. I will now move on to process the images for CO 60/5, of which there are 98 file in 1859.

Permalink 03:45:34 pm, by kim, 43 words, 69 views   English (CA)
Categories: Activity log; Mins. worked: 5

Page Break Tag Update

Finished the addition of PB tags to the 1859 files found in CO 305/13 and CO 410/1. The latter, letterbook copies, pointed occasionally to enclosures that exist in the originals. Presumably, once and if we track down the originals, we will have to add these items.

05/01/11

Permalink 02:43:23 pm, by mholmes, 525 words, 72 views   English (CA)
Categories: Activity log; Mins. worked: 180

Preparing Coldesp for future migration to latest Cocoon/eXist builds

The Cocoon/eXist build that's housing the current ColDesp application is pretty long in the tooth, and we're going to need some newer features in the coming months (see the previous post). So I've been preparing the way for migration by testing the current application as it runs inside our latest build. This is what I had to do:

  • Made a backup of the whole db from my local Coldesp app.
  • Copied a fresh Cocoon/eXist into my local Tomcat's webapps directory as "coldesp2".
  • Copied the "site" directory from the old version.
  • Restored the content to the db (there were a couple of error messages during this, but as far as I can see, nothing is missing).
  • Started up the app, and discovered that there were sitemap errors, so started commenting things out. To get it working, I had to comment out this from the top of the file:
    <map:generators default="file">
                <map:generator logger="sitemap.generator.text" name="text" src="org.apache.cocoon.generation.TextGenerator"/>
            </map:generators>
    
    and all instances of this:
    <map:transform type="session"/>
    I don't think the former was actually used (no sign so far), and the only possible function of the latter was to handle the authentication for blocking access to non-UVic users; that will need some careful testing.

These are my conclusions so far:

Working:

  • Regular site pages (home, About etc.)
  • Browse the collection
  • View a document
  • View annotation content (people, places, etc.)
  • Search
  • Map gallery and maps
  • Indexes
  • Schedules

Failing or suspect:

  • Scan images (clicking on a collection results in a long wait caused by an out-of-memory error). I suspect this might be cause by an indexing problem, resulting from one of the collection.xconf files not being quite right for the new eXist. I saw an error about a collection.xconf file go by during the restore. I've tried editing the collection.xconf file quickly to add a tei namespace prefix and reindexed the collection, but that didn't help. I should probably look at that pipeline and see exactly what's happening. There's no reason it should eat up much memory. On the other hand, when I hit the same page in my old Coldesp app (local Tomcat) I get the same error, so it looks as though this is simply an optimization issue that can be solved first for the old version and the same fix might work for the new.
  • Clicking on a snippet to go to its document resulted in an error once (something in my xqSearchUtils.jar library). This might be a fluke, or might be something more significant.

This is not exhaustive, but it's quite heartening; it suggests that reworking the collection.xconf files (and perhaps adding some for the actual documents) could be all we need to do to get a basic working web application. I can then test it side-by-side with the old one, to determine relative speed and see if there are any slowdowns that need to be worked on in the new version. Assuming it performs at least as well as the old one, there's no reason not to migrate.

Permalink 12:45:37 pm, by mholmes, 466 words, 75 views   English (CA)
Categories: Activity log; Mins. worked: 120

OAI interface progress

Today I added two new functions, f:pruneRecord($recId as xs:string) and f:pruneRecords(). These check one or all of the OAI records in the database to see if a there's a parallel correspondence document for it; if not, it's presumably been deleted, and the OAI record is also deleted. Tested (locally) and working.

Although the actual API for OAI requests and responses is relatively simple, the specification allows for a harvester to make large numbers of requests which can require significant amounts of data from the database; this is especially so in the case of our collection, in which we have a great deal of metadata to offer about every document. There is the risk that a single harvester could hit the db with enough data requests to slow the site for other users. For that reason, I'm taking a three-stage approach to designing the OAI features:

First of all, I have a set of routines that generate and maintain individual OAI records for every document in the correspondence. This is basically complete, and I have a full set of OAI records in my development version of the db; I also have routines written which check all the existing documents and refresh the metadata if they've changed, delete obsolete metadata records, and so on. This part of the work is done.

The next stage will be to write the actual query interface which allows a harvester to request this metadata from the db. Once that's done, the db will be able to provide metadata to a harvester in accordance with the specification.

Finally, I need to address the issue of maintaining the OAI records in the live database. The process of checking and regenerating records takes quite a long time at the moment (between ten and thirty minutes, depending on what it has to do). I don't want to be running that kind of intensive process on the live db. In the meantime, I can run it periodically on my local copy of the db, and upload the resulting records, but ultimately we want to have a more flexible system whereby any change to a document results in an update to its associated OAI record. I can do this using triggers in the eXist database, but it will require an update to a more recent version of eXist. This is something I've been planning for a while, and it will bring other benefits, but it's something I'll have to test carefully before we deploy it.

So I'd expect actual OAI functionality to go live by the end of the month, if nothing unexpected intervenes, and then I'll start the port to the new version of eXist. If that goes smoothly, implementing triggers shouldn't be too complicated, and then the OAI records will maintain themselves.

04/01/11

Permalink 04:39:48 pm, by mholmes, 17 words, 75 views   English (CA)
Categories: Activity log; Mins. worked: 45

Input for grant application

KSW and I wrote some content for JL and CP, to help with the latest grant application.

Permalink 08:03:36 am, by mholmes, 11 words, 63 views   English (CA)
Categories: Activity log; Mins. worked: 15

Stats for 2010 retrieved

Got the complete 2010 stats (the ones we care about) from Megapode.

Colonial Despatches

The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at http://bcgenesis.uvic.ca, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at http://revision.tapor.uvic.ca/svn/coldesp/.

Reports

Categories

2011
 << Current>>
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Nov Dec

XML Feeds