More work on matching ContentDM data with our maps

05/04/12

Permalink 02:46:34 pm, by mholmes, 230 words, 103 views   English (CA)
Categories: Activity log; Mins. worked: 90

More work on matching ContentDM data with our maps

I've done some preliminary alignment with XSLT to find out which maps we have which can be matched with entries from ContentDM:

  • 176 items have matching Penfold numbers. These would be reliable matches.
  • I've matched a further 9 items based on catalogue ids.
  • One item where we have a Penfold number appears not to have a match in ContentDM. This is #549, mpg_1-557_3_queen_charlotte_sound_1792, which seems to be missing from ContentDM.
  • 76 items in ContentDM have no match (via Penfold) in our collection.
  • In addition to #549, 33 items in our collection have no match in ContentDM.

It seems likely that many of these items actually do match, but because they have no Penfold numbers or matching ids, I'll have to match them with some sort of fuzzy matching approach.

I regenerated my map_lookup.xml file with a bit of added data:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace tei = "http://www.tei-c.org/ns/1.0";

<maps xmlns="http://hcmc.uvic.ca">
{
for $t in //tei:TEI
return 
<map xml:id="{$t/@xml:id}">
{
if ($t//tei:title) then
<title>{$t//tei:title[1]/text()}</title>
else
()
}
{
if ($t//tei:idno[@type="penfoldNum"]) then
(
<penfold>{$t//tei:idno[@type="penfoldNum"]/text()}</penfold>,
<docId>{$t//tei:idno[@type="doc_id"]/text()}</docId>
)
else
()
}

Pingbacks:

No Pingbacks for this post yet...

Colonial Despatches

The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at http://bcgenesis.uvic.ca, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at http://revision.tapor.uvic.ca/svn/coldesp/.

Reports

Categories

August 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

XML Feeds