Archives for: March 2012, 30

30/03/12

Permalink 02:06:00 pm, by mholmes, 163 words, 97 views   English (CA)
Categories: Activity log; Mins. worked: 30

map_lookup.xml done

Simple XQuery to pull out the data:

xquery version "1.0";

declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace tei = "http://www.tei-c.org/ns/1.0";

<maps xmlns="http://hcmc.uvic.ca">
{
for $t in //tei:TEI
return 
<map xml:id="{$t/@xml:id}">
{
if ($t//tei:title) then
<title>{$t//tei:title[1]/text()}</title>
else
()
}
{
if ($t//tei:idno[@type="penfoldNum"]) then
<penfold>{$t//tei:idno[@type="penfoldNum"]/text()}</penfold>
else
()
}
</map>

}
</maps>

I might have to add more data points to the output; in fact it might be worth just pulling out the whole of the sourceDesc. I'm currently looking at the possibility of enhancing my UniSymMetric Java class so it could be called as an extension function from XSLT in Saxon; that would give me a fallback when there's no Penfold number, and it might be handy in all sorts of other ways too.

Permalink 10:56:36 am, by mholmes, 306 words, 71 views   English (CA)
Categories: Activity log; Mins. worked: 60

Importing metadata from ContentDM

JD pointed me at an OAI feed from ContentDM, which is exactly what I need for my metadata harvesting. This is my plan:

I've started work on an XSLT stylesheet to do the job. The purpose of the stylesheet is to process detailed OAI metadata records which use Dublin Core identifiers into teiHeader elements suitable for adding to TEI documents Despatches project.

The OAI metadata is in the file oai_from_contentdm.xml, and originates in the UVic Library's ContentDM system. It contains 261 records relating to Early BC Maps, and most of these are maps also in the Colonial Despatches project collection. The ContentDM metadata is well-organized and has been considerably enhanced, so we're going to take that data and generate new teiHeader elements for our TEI files from it.

The first stage is to create a mapping between each of the fields in the OAI data and the location in the teiHeader where we propose to store it.

Input documents:

  • oai_from_contentdm.xml (OAI record set).
  • ../xml/maps/*.xml (TEI documents for each of the maps)
  • map_lookup.xml (simple XML document which hopefully provides enough data to allow this transformation process to retrieve the correct TEI document for each record in the OAI data. This lookup will be based on a number of factors, including Penfold number, title, and descriptive information. Creating this file is the next stage in the process.

Output documents:

  • ../xml/maps_enhanced/*.xml (from each TEI document we have, create an enhanced version which incorporates the original @xml:id and metadata, as well as the facsimile element with data about the image file, but also builds in the metadata gleaned from the OAI file. These files will eventually replace the original TEI files in the Despatches site, once the Map Gallery code has been rewritten to work with them.
Permalink 08:25:26 am, by mholmes, 178 words, 63 views   English (CA)
Categories: Activity log; Mins. worked: 30

Map confusion and metadata

Adding this as a task for me, long-term, because it needs to be part of the plan for the next phase of the project.

I had pointed JT at fo_925-1650_pt_1_24_vic_harbour_1847, which is Penfold 576, for the Kellett map of Victoria Harbour, but it turns out he wanted Penfold 577, which is fo_925-1807_vic_1848. I've slightly enriched the metadata for 577 using data from ContentDM, manually, but there should be a way to do this mechanically because the ContentDM metadata is organized into clear fields. Ultimately, it would be a good idea to find some way to get at this metadata and pull it into our headers, so we'll have to write a mapping between the two. Here's an example of the ContentDM data in HTML:

http://contentdm.library.uvic.ca/cdm/singleitem/collection/collection5/id/130/rec/2

It claims to be XHTML, but it's not even well-formed, never mind valid, so it couldn't be parsed with e.g. XSLT unless it was tidied first. Hopefully there's a more helpful feed from it. I'm contacting JD about that.

Colonial Despatches

The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at http://bcgenesis.uvic.ca, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at http://revision.tapor.uvic.ca/svn/coldesp/.

Reports

Categories

March 2012
Sun Mon Tue Wed Thu Fri Sat
 << < Current> >>
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30

XML Feeds