I'm now generating SQL statements for Oracle in consultation with AW, and I think we've arrived at a mapping of the Carrier data onto the existing FV Oracle fields. I've generated a 2.8MB SQL file full of inserts, and sent it off for testing. Nothing remarkable about the XSLT that generates it, so not worth posting. Just a lot of care around apostrophes, which are Oracle field delimiters.
A further 1422 page-images have been added to the manuscript image browser, covering British Columbia 1859 Public Offices Part 2, and Miscellaneous. Transcriptions are now being linked into these images.
find media -name *.mov -exec ffmpeg2theora --videoquality 6 {} \;
Feb.1: Request from Rel.Studies:
Removed the job posting from Faculty-Job Opportunities page as posting
now closed.
Finished the first pass of the dynamic 'related videos' sidebar, which displays a random list videos related to the one the user is currently watching. The code is contained within a findRelatedVideos() function to keep the player.php page decently clean. The function is fully documented and fairly simple to use. Though the function is currently displaying videos based on the <nationality> element within the <person> element, it can switch to any other element value within <person> simply by changing the function arguments.
Here's the function doc verbatim:
* Retrieves a list of videos related to $id based on the contents of $field.
* The field - for example, 'nationality' or 'residence' - should be within the
* <person> element in the video XML file. The default number of videos returned
* is 5, but can be configured with the optional $options array. If the number
* of related videos is greater than the number of videos to be returned, then
* they're chosen randomly.
*
* The videos are transformed into <li> elements in includes/xslt/related.xslt.
*
* Due to the structure of the system, the eXist $db object and the results of
* the XML query must also be passed.
*
* The function returns an associative array:
* - videos: the formatted video <li> elements, transformed via XSLT
* - field: the value of $field
*
* @param string $id Video ID
* @param object $db DB object
* @param mixed $xmlResult Result of query from player.php
* @param string $field Name of related field within <person>
* @param array $options Optional settings OPTIONAL
* @return array An array with the videos and the value of $field
* @author Jamie Nay
* @date 2011-01-31
*/
Couldn't get out on time -- helping folks get set up on machines, etc.
Rules changed again a couple more times, but I've talked with AW and it looks like he'll want SQL insertions. Meanwhile, I've finished the sort, and elaborated that file into a more complex one which unrolls entries with multiple definitions and example sentences into multiple entries; it also cleans up whitespace and puts all the parts of the entries into the right order. These are my notes, and the crucial final file:
Files: carrier_5.csv: CSV file generated from MDB file reader, with fields delimited by | and text quoted with `. carrier_5.fods: FODS file (flat XML spreadsheet file) saved from OOo Calc after opening the CSV. carrier_5_cleaned.xml: XML file generated from carrier_5.fods using clean_fods.xsl. This cleans up a number of idiosyncracies such as colspan-type structures which make the XML hard to process. carrier_5_cleaned_customized.xml: XML file generated from carrier_5_cleaned.xml using convert_cleaned_fods.xsl. This converts the anonymous FODS file elements into useful and recognizable human-readable tag names. carrier_5_cleaned_customized_ordered.xml: XML file generated from carrier_5_cleaned_customized.xml using carrier_order.xsl. This sorts all the entries, generates duplicate entries for multiple examples, and puts the data in the right order inside each entry tag.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
xmlns:mdh="http://www.mholmes.com/namespaces/xslt"
exclude-result-prefixes="xs xd"
version="2.0">
<xd:doc scope="stylesheet">
<xd:desc>
<xd:p><xd:b>Created on:</xd:b> Jan 28, 2011</xd:p>
<xd:p><xd:b>Author:</xd:b> mholmes</xd:p>
<xd:p></xd:p>
</xd:desc>
</xd:doc>
<!-- This function generates a string on which we can sort the data, starting
from the Dakelh term field. -->
<xsl:function name="mdh:tweak" as="xs:string">
<xsl:param name="inString" as="xs:string"/>
<xsl:variable name="output" select="$inString"/>
<xsl:variable name="trimmedInput" select="normalize-space($inString)"/>
<xsl:choose>
<xsl:when test="string-length($inString) gt 0">
<!-- Replace all accented vowels with their unaccented equivalents. -->
<xsl:variable name="accentsGone" select="translate(normalize-space(lower-case($trimmedInput)), 'áéíóú', 'aeiou')"/>
<!-- Get rid of all combining underscores (u+0331 and u+0332).-->
<xsl:variable name="underscoresGone" select="replace($accentsGone, '̱|̲', '')"/>
<!-- Now before we remove the apostrophes, we need to replace some pairs of letters that need to sort as if they were one.
We can use a character following z to replace a character that needs to sort after all the rest. For example:
kh needs to sort after ka, kb, kc, kz, so we can replace kh with k{
So we do:
g becomes {
h becomes }
l becomes ~
o becomes ¥
s becomes ¦
w becomes §
z becomes ©
We also USED TO want to preserve the initial apostrophe,
because that leads to sorting words to the beginning
of the list; but we wanted to remove other apostrophes,
because they were not used in sorting. So we replaced
the initial one with !. However, that requirement was
changed, so those bits of the code are commented out.
-->
<xsl:variable name="gReplaced" select="replace($underscoresGone, 'ng', 'n{')"/>
<xsl:variable name="hReplaced" select="replace($gReplaced, '(c|g|k|l|s|w)h', '$1}')"/>
<xsl:variable name="lReplaced" select="replace($hReplaced, '(d|t)l', '$1~')"/>
<xsl:variable name="oReplaced" select="replace($lReplaced, 'oo', 'o¥')"/>
<xsl:variable name="sReplaced" select="replace($oReplaced, 'ts', 't¦')"/>
<xsl:variable name="wReplaced" select="replace($sReplaced, '(g|k)w', '$1§')"/>
<xsl:variable name="zReplaced" select="replace($wReplaced, 'dz', 'd©')"/>
<!--<xsl:variable name="firstAposReplaced" select="replace($zReplaced, '^''', '!')"/>
<xsl:variable name="aposGone" select="replace($firstAposReplaced, '''', '')"/>-->
<xsl:variable name="aposGone" select="replace($zReplaced, '''', '')"/>
<xsl:value-of select="$aposGone"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="noTerm">__NO DAKELH TERM IN THIS ENTRY.</xsl:variable>
<xsl:value-of select="$noTerm"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<!-- This template generates an entry from another entry, but
includes only one of the example sentence sets. -->
<xsl:template name="basicEntry">
<xsl:param name="inputEntry"/>
<xsl:param name="transNum"/>
<xsl:param name="sortString"/>
<entry>
<xsl:attribute name="id" select="concat(normalize-space(id), '_', $transNum)"/><xsl:text>
</xsl:text>
<id><xsl:value-of select="normalize-space(id)"/></id><xsl:text>
</xsl:text>
<dakTerm><xsl:value-of select="normalize-space(dakTerm)"/></dakTerm><xsl:text>
</xsl:text>
<sylls><xsl:value-of select="normalize-space(sylls)"/></sylls><xsl:text>
</xsl:text>
<partOfSpeech><xsl:value-of select="normalize-space(partOfSpeech)"/></partOfSpeech><xsl:text>
</xsl:text>
<verbRoot><xsl:value-of select="normalize-space(verbRoot)"/></verbRoot><xsl:text>
</xsl:text>
<desc><xsl:value-of select="substring-before(normalize-space(desc), ' ')"/></desc><xsl:text>
</xsl:text>
<possessedForm><xsl:value-of select="normalize-space(possessedForm)"/></possessedForm><xsl:text>
</xsl:text>
<xsl:choose>
<xsl:when test="$transNum = 1">
<engTrans><xsl:value-of select="normalize-space(engTran1)"/></engTrans><xsl:text>
</xsl:text>
<dakSent><xsl:value-of select="normalize-space(dakSent1)"/></dakSent><xsl:text>
</xsl:text>
<engSent><xsl:value-of select="normalize-space(engSent1)"/></engSent><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="$transNum = 2">
<engTrans><xsl:value-of select="normalize-space(engTran2)"/></engTrans><xsl:text>
</xsl:text>
<dakSent><xsl:value-of select="normalize-space(dakSent2)"/></dakSent><xsl:text>
</xsl:text>
<engSent><xsl:value-of select="normalize-space(engSent2)"/></engSent><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="$transNum = 3">
<engTrans><xsl:value-of select="normalize-space(engTran3)"/></engTrans><xsl:text>
</xsl:text>
<dakSent><xsl:value-of select="normalize-space(dakSent3)"/></dakSent><xsl:text>
</xsl:text>
<engSent><xsl:value-of select="normalize-space(engSent3)"/></engSent><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="$transNum = 4">
<engTrans><xsl:value-of select="normalize-space(engTran4)"/></engTrans><xsl:text>
</xsl:text>
<dakSent><xsl:value-of select="normalize-space(dakSent4)"/></dakSent><xsl:text>
</xsl:text>
<engSent><xsl:value-of select="normalize-space(engSent4)"/></engSent><xsl:text>
</xsl:text>
</xsl:when>
</xsl:choose>
<sortField><xsl:value-of select="$sortString"/></sortField><xsl:text>
</xsl:text>
</entry><xsl:text>
</xsl:text><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="/">
<entries>
<xsl:for-each select="//entry">
<xsl:sort select="mdh:tweak(dakTerm)"/>
<xsl:variable name="tweakedTerm" select="mdh:tweak(dakTerm)"/>
<xsl:call-template name="basicEntry">
<xsl:with-param name="inputEntry" select="."/>
<xsl:with-param name="transNum" select="1"></xsl:with-param>
<xsl:with-param name="sortString" select="$tweakedTerm"></xsl:with-param>
</xsl:call-template>
<xsl:if test="string-length(normalize-space(concat(engTrans2, dakSent2, engSent2))) gt 0">
<xsl:call-template name="basicEntry">
<xsl:with-param name="inputEntry" select="."/>
<xsl:with-param name="transNum" select="2"></xsl:with-param>
<xsl:with-param name="sortString" select="$tweakedTerm"></xsl:with-param>
</xsl:call-template>
</xsl:if>
<xsl:if test="string-length(normalize-space(concat(engTrans3, dakSent3, engSent3))) gt 0">
<xsl:call-template name="basicEntry">
<xsl:with-param name="inputEntry" select="."/>
<xsl:with-param name="transNum" select="3"></xsl:with-param>
<xsl:with-param name="sortString" select="$tweakedTerm"></xsl:with-param>
</xsl:call-template>
</xsl:if>
<xsl:if test="string-length(normalize-space(concat(engTrans4, dakSent4, engSent4))) gt 0">
<xsl:call-template name="basicEntry">
<xsl:with-param name="inputEntry" select="."/>
<xsl:with-param name="transNum" select="4"></xsl:with-param>
<xsl:with-param name="sortString" select="$tweakedTerm"></xsl:with-param>
</xsl:call-template>
</xsl:if>
<xsl:text>
</xsl:text>
</xsl:for-each>
<!--<xsl:for-each select="//entry">
<xsl:sort select="mdh:tweak(dakTerm)"/>
<xsl:variable name="term" select="dakTerm"/>
<orig><xsl:attribute name="id" select="normalize-space(id)"/><xsl:value-of select="normalize-space($term)"/></orig><xformed><xsl:value-of select="normalize-space(mdh:tweak($term))"/></xformed><xsl:text>
</xsl:text>
</xsl:for-each>-->
<!--<xsl:for-each select="//entry">
<xsl:sort select="mdh:tweak(dakTerm)"/>
<xsl:variable name="term" select="dakTerm"/>
<xsl:value-of select="normalize-space($term)"/><xsl:text>
</xsl:text>
</xsl:for-each>-->
</entries>
</xsl:template>
</xsl:stylesheet>
That makes nine with last week's half-finished one. They'll need quite a lot of cleanup too. That's a priority for next week. They're appropriately named, and in the "incoming" folder. Three are pairs in the same image, and will need to be split.
The Subversion rollout is working well. GBS is now using it, and CC has been set up on Carrot with it (although isn't editing XML right now). Only LCC left.