Simple XQuery to pull out the data:
xquery version "1.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
<maps xmlns="http://hcmc.uvic.ca">
{
for $t in //tei:TEI
return
<map xml:id="{$t/@xml:id}">
{
if ($t//tei:title) then
<title>{$t//tei:title[1]/text()}</title>
else
()
}
{
if ($t//tei:idno[@type="penfoldNum"]) then
<penfold>{$t//tei:idno[@type="penfoldNum"]/text()}</penfold>
else
()
}
</map>
}
</maps>
I might have to add more data points to the output; in fact it might be worth just pulling out the whole of the sourceDesc. I'm currently looking at the possibility of enhancing my UniSymMetric Java class so it could be called as an extension function from XSLT in Saxon; that would give me a fallback when there's no Penfold number, and it might be handy in all sorts of other ways too.
JD pointed me at an OAI feed from ContentDM, which is exactly what I need for my metadata harvesting. This is my plan:
I've started work on an XSLT stylesheet to do the job. The purpose of the stylesheet is to process detailed OAI metadata records which use Dublin Core identifiers into teiHeader elements suitable for adding to TEI documents Despatches project.
The OAI metadata is in the file oai_from_contentdm.xml, and originates in the UVic Library's ContentDM system. It contains 261 records relating to Early BC Maps, and most of these are maps also in the Colonial Despatches project collection. The ContentDM metadata is well-organized and has been considerably enhanced, so we're going to take that data and generate new teiHeader elements for our TEI files from it.
The first stage is to create a mapping between each of the fields in the OAI data and the location in the teiHeader where we propose to store it.
Input documents:
Output documents:
Adding this as a task for me, long-term, because it needs to be part of the plan for the next phase of the project.
I had pointed JT at fo_925-1650_pt_1_24_vic_harbour_1847, which is Penfold 576, for the Kellett map of Victoria Harbour, but it turns out he wanted Penfold 577, which is fo_925-1807_vic_1848. I've slightly enriched the metadata for 577 using data from ContentDM, manually, but there should be a way to do this mechanically because the ContentDM metadata is organized into clear fields. Ultimately, it would be a good idea to find some way to get at this metadata and pull it into our headers, so we'll have to write a mapping between the two. Here's an example of the ContentDM data in HTML:
http://contentdm.library.uvic.ca/cdm/singleitem/collection/collection5/id/130/rec/2
It claims to be XHTML, but it's not even well-formed, never mind valid, so it couldn't be parsed with e.g. XSLT unless it was tidied first. Hopefully there's a more helpful feed from it. I'm contacting JD about that.
The Colonial Despatches is an XML database project which is creating a digital archive containing the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia. The project lives at http://bcgenesis.uvic.ca, and the web application runs on the Pear dev Tomcat. The XML data is managed in SVN at http://revision.tapor.uvic.ca/svn/coldesp/.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | Current | > >> | ||||
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | |