On a deadline for the Properties stuff, and got off to a slow start because of bad data relating to complex transactions...
GM is now linking from the Ville-Thierry to existing references.
I've spent most of the day working on building the new transaction view in the form of a spreadsheet, and I think it's about 2/3 done. I have ethnicity-munging working in the XSLT, as well as munging of institution types for multiple institutions. In the process, I've found and fixed some data-entry issues (as always).
I now have a two-stage XSLT process. The first stage, working on the XML output from the db, takes several minutes; this generates a view of the database which is centred specifically on transactions, importing all relevant data for each transaction into the transaction itself. Once this is done, the second-stage process then processes that output to create a spreadsheet; this is much faster (a few seconds), and it's easier to work on and test the second process without having to run the first process over and over again.
I wrote a quick bash script that lets me start or stop a local instance of tomcat with a single click. If you ONLY intend to ever run one tomcat at a time this will work pretty well.
It uses catalina.sh instead of startup.sh, and sets the CATALINA_PID variable to write a file containing the pid of the launched tomcat.
It first checks to see if there is a pid file at the location set by CATALINA_PID. If there is, the script reads the file and, making the assumption that you want to stop the running tomcat, calls 'catalina.sh stop', waits a few seconds and checks for the pid file again. If the file still exists it runs kill -9 on the pid, hopefully *really* stopping tomcat.
If there is no pid file we assume that tomcat is not running, and run the launch command. In my case I set the java version first, then provide a path for the PID variable, then run 'catalina.sh start'
More details in the code comments.
UPDATED FOR MAC OS: added OS X-specific alerts.
HCMC website - Cascade
Spent time today reviewing HCMC site again with thoughts of what to include/exclude
etc. in the new site.
Primaries and secondaries have been created with some content included. Significant editing of previous content required for new website. Creation of new blocks and request for content in progress.
Spent time today reviewing HCMC site again with thoughts of what to include/exclude
etc. in the new site.
Primaries and secondaries have been created with some content included. Significant editing of previous content required for new website. Creation of new blocks and request for content in progress.
ER sent me copy of page with order forms on it for review.
Functionality seems fine with one exception: if user selects something, goes to shopping cart page, then returns to Malahat ordering page, the only way to get to the shopping cart page again is to buy an additional item and then in the shopping cart page delete the extra item. Page would probably have to be changed to php to allow me to grab the session id and display a "go to shopping cart" button.
Bunch of standards violations (elements not closed, uppercase element names, ampersands not escaped, missing quotation marks around attribute values). In anticipation of request, I rewrote page so that is validates against xhtml 1.1.
Also added bit of javascript and html div elements so that 3 order forms are hidden by default and the appropriate one is shown when the user clicks on a link on the page.
MF reported a missing file in the Klondike site. Compared current production site against archive from 2008 and found a number of files missing, so uploaded the missing files to the production site. Then went through all the other sites and uploaded any missing files. Then synchronized the production sites against the backup on my computer.
Noted that there are a three non-site folders in the sites folder on the production site (and thus not backed up anywhere). Emailed MF to ask what she wants done with those.
week of Apr 16 - Apr 20 M 0, T -1 DSA appointment, W 0, R +0.5 hold fort, F +0.5 faculty meeting
week of Apr 23 - Apr 27 M 0, T +1.0 dh ctte followup, W -0.5 CSG pickup, R 0.5 hold fort, F 0
Have included more temporary "content" within existing site structure to demo at Wednesday's meeting. Other features now included: links
Added the Spanish 100B tests for 2005 to the website (received from NM and built with my special source files). Presumably 2011 will be coming soon.
My P4 to P5 conversion is now working, and producing valid output on abstracts, entries, and the project metadata file. I may do more work on this, but I'll be moving on to the -ography stuff next.
More work ahead of release next month: introducing new recommendation to use xml-model.
I have a script currently generating a list of candidate duplicate owners. This is how it was done:
#!/bin/bash
#This script is designed to run a series of comparison tests of xml-encoded owner
#records in an attempt to discover possible duplicates, which are then to be investigated
#by the PI manually.
#Threshold below which to consider a possible dupe
MINSIM=0.1
#First, paths to files.
USM_JAR=/home/mholmes/WorkData/netbeans/uniSimMetric/dist/uniSimMetric.jar
NCD_COMMAND="ncd -l "
INPUTFILE=/home/mholmes/WorkData/history/stanger-ross/properties/xml/owners_12_04_27_flattened.txt
OUTFILE="/home/mholmes/WorkData/history/stanger-ross/properties/xml/owner_dupe_candidates_`date +%Y%m%d`.txt"
#Echo the start out to the output file.
echo "Possible duplicate owners found by string comparison using USM">$OUTFILE
echo "">$OUTFILE
#Initialize a counter
C=0
#Read in the inputs line by line
cat $INPUTFILE | while read line;
do
#Ignore empty lines. This ensures we can read five lines forward (there are five empty lines at the end of the file).
let "C=$C+1"
LEN=${#line}
if [ $LEN -gt "3" ];
then
for ((N=$C+1; N<$C+6; N++))
do
STR2=`awk NR==${N} $INPUTFILE`;
#Call the USM to compare them.
USM=`java -jar $USM_JAR -compare -str1="$line" -str2="$STR2"`
#Call NCD to compare them
# NCD=`$NCD_COMMAND "$line" "$STR2"`
#NCD outputs the second string on the command line before the score; we need to remove it.
# NCD=${NCD/$STR2}
#If the threshold similarity is greater than the specified value, output info to the output file.
if [[ "$USM" < "$MINSIM" ]];
then
echo "Found similarity"
echo $line | sed -n 's/.*<owners><own_owner_id>\(.*\)<\/own_owner_id>.*/\1/p'>>$OUTFILE
echo $STR2 | sed -n 's/.*<owners><own_owner_id>\(.*\)<\/own_owner_id>.*/\1/p'>>$OUTFILE
echo "">>$OUTFILE
fi
done
fi
done
#Display the output file.
`gedit $OUTFILE`
echo "Done!"
exit
This is successfully producing a list of candidate matches right now, outputting the ids of the two candidates followed by a blank line, for each candidate match.
Arranged and confirmed meeting next week with DF, DR, SA and myself to discuss next steps with their Cascade website.
Received payment for HB's 2012 1st quarter.
Deposited payment; receipt filed in HCMC records.
Release is due in three weeks. Working on additions to the header chapter.
Isolated seven more problematic transactions, and fixed two of them (bad dates); started creating xsl:keys to speed up processing, and mapped tract and property information into transactions.
Finally figured out how to successfully implement a data range search.
In plugins/vpn-search/form_advanced.php page, I added the two elements to the search form.
In plugins/vpn-search/vpn-search.php, I modified the getAdvancedConditons method (which constructs the conditions for the WHERE clause in the SQL query) by adding special cases to deal with table.field values of poems.po_date and poems.po_date2. There is no po_date2 field in the poems table, so the special case code inserts "po_date" instead of "po_date2". The reason I'm using a bogus table.field identifier is explained below
In plugins/vpn-search/classes/VPNFormBuilder.php, the fieldMap array uses a table.field name as the key and the id of the element in the GUI as the value. Code assumes a one-to-one relationship between GUI elements and table.field specifier. I needed to add two new key-value pairs, but the two new elements both are associated with the same table.field. I can't use the same key for more than one value, so I created a bogus table.field value (poems.po_date2) and then special-cased that value in the code in vpn-search.
At the moment, the date-range-start and date-range-end fields accept four digit years, and if crazy values are inserted, no checking is done and the user just gets 0 hits.
Added code to the scraper php file so it:
- displays the status of the processing of each record
- generates a table of contents file with the name of each htm file containing a CircleMagic player and link to it, and the name of each record in the DevMS Wikibook and a link to that record's URL, and includes a link to that TofC page at the end of the report log on the php page.
Wrote a readme.txt file which details
- the required folders and files
- the structure of the XML needed for circleMagic
- naming conventions for the files generated by the scraper
- how the 2 template files work
- notes on wikibooks API and GUI for testing
Added a bunch of inline documentation to the scraper.php file.
Ran it on the full DevMS wikibook and generated 241 x 2 - 482 files. CC reviewed and approved.
CC reviewed
Bash commands using sudo AND redirect can fail on the redirect because sudo permissions are not passed on to the next stage in your command.
For example, the following failed for me with a permission denied error:
sudo echo "something important" >> /etc/apt//mirror.list
It failed because the bit after the final double-quote is a redirect apparently, and not part of the original echo command. The solution is to wrap the whole thing up thusly:
sudo bash -c "echo \"something important\" >> /etc/apt//mirror.list"
Suggestion from JS-R:
Maybe three variables. 1. A "true"/"false" variable--is at least one of the buyers an institution? 2. If true, what is the institution type: (1) Ethnic (2) Private (3) Public (4) Multiple 3. If there is only one institution among the buyers, what is its name?
Out of the office for a few hours to go to VIU, so stayed late to keep projects moving forward...
Up to Duncan to hear JS deliver a talk on "Drop the Digital". Excellent stuff.
Made significant progress today writing XSLT to convert the rather screwed-up P4 encoding we currently have into P5. Lots of time spent on a few small issues, such as converting <handList> and contents, whose more obscure attributes don't map easily onto the P5 <handNote>. I now have valid working output from my sample abstract and entry files. However, there is more I'd like to do in terms of tweaking attribute values such as @type on <div>, and @xml:ids. I also want to try to pull out some key information (dates etc.) which is currently available in the transcription and/or in attributes such as @n or @id, and record it formally in the headers.
Following that, I'll need to convert the project metadata file, and then the dreaded ography stuff, which is not TEI at all.
Updated website with new employment opportunity information.
Sent email confirming update completion to BAK (cc'd SA)
Working on a TEI problem, and trying to get through this week's lectures in the NLP course.
Tweaked the XSLT to handle situations in which an <rs> tag was linked through @ref to multiple <person> ids. Set up a new catalogue file and ographies.xml file for the Harper part of the project, and went through setting up transformation scenarios with KT on her system. Not sure yet whether we should be version-controlling the mvp.xpr file in which the scenarios may be stored; each of us is likely to do different things in that file, so it may be simpler just to set up new scenarios on any system which is being used for editing.
Confirmed recent equipment purchases against FAST account; filed hardcopies
Entered new equipment in HCMC inventory data-base.
Working on a remedy for AdBlock filters blocking some ids of elements on Guidelines page; created a stylesheet to implement adding a tei_ prefix in a post-processing stage after the guidelines have been generated.
Running hard to stay in the same place...
...described in this post.
Various solutions to 7-source constraint imposed by CirclMagic
<!-- kludge that generates solid first ring (which provides no information) and second ring of N coloured wedges with no grey ring outside -->
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
</detail>
<detail>
<id frequency="2">Cultures18-sub1-1</id>
<id frequency="2">Cultures18-sub1-2</id>
<id frequency="2">Cultures18-sub1-3</id>
<id frequency="2">Cultures18-sub1-4</id>
<id frequency="2">Cultures18-sub1-5</id>
<id frequency="2">Cultures18-sub1-6</id>
<id frequency="2">Cultures18-sub1-7</id>
<id frequency="2">Cultures18-sub1-8</id>
<id frequency="2">Cultures18-sub1-9</id>
</detail>
<detail>
</detail>
</details>
</source>
Above model could obviously be extended to include another ring for Major/Minor. We could use that first ring to indicate staff or public, then within each of those categories have a wedge for each author and the third ring for major/minor for each author.
One problem with any approach that uses the innermost ring as a placeholder is that if the user clicks on that ring, the rest of the circle is greyed out, and that effect makes more sense if the innermost ring is authors, not some grouping of authors.
Just finished week 5 problem sets; I'm basically a week behind because of the TEI Council meeting, but just staying within the deadlines...
The xml output that my code generates for the The Devonshire Manuscript page on the wiki caused the CircleMagic display to throw an error. Took a couple of hours to figure out the problem.
1) Discovered that if you have more than 7 source elements in the XML file, MagicCircle generates an error message rather than displays the data. The absolute and relative size of the counts in each source element don't seem to matter. I tested to see if there is a similar limit on the number of detail elements, and stopped testing at 18 details within one source
Here's the structure for a source which kind of solves the problem, by using one source and N detail elements in that source resulting in the innermost ring being all one colour and the second ring divided into n wedges. (The empty third detail element suppresses the display of the black outer ring.):
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
</detail>
<detail>
<id frequency="2">Cultures18-sub1-1</id>
<id frequency="2">Cultures18-sub1-2</id>
<id frequency="2">Cultures18-sub1-3</id>
<id frequency="2">Cultures18-sub1-4</id>
<id frequency="2">Cultures18-sub1-5</id>
<id frequency="2">Cultures18-sub1-6</id>
<id frequency="2">Cultures18-sub1-7</id>
<id frequency="2">Cultures18-sub1-8</id>
<id frequency="2">Cultures18-sub1-9</id>
</detail>
<detail>
</detail>
</details>
</source>
2) Did various test of xml structures to see what would be output, with following results:
<!-- generates a "problem with data" error -->
<source>
<id frequency="18">Cultures18</id>
</source>
<!-- generates a ring of coloured wedges too big to fit into the viewport -->
<source>
<id frequency="18">Cultures18</id>
<details>
</details>
</source>
<!-- generates ring of coloured wedges, what you want for 1-level detail with 7 or fewer sources -->
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
</detail>
</details>
</source>
<!-- generates ring of coloured wedges with black.dark grey ring outside -->
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
<id frequency="9">Cultures18-sub1</id>
<id frequency="9">Cultures18-sub2</id>
</detail>
</details>
</source>
<!-- generates two rings of coloured wedges -->
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
<id frequency="9">Cultures18-sub1-1</id>
<id frequency="9">Cultures18-sub1-2</id>
</detail>
<detail>
</detail>
</details>
</source>
<!-- generates two rings of coloured wedges with black/dark grey ring outside -->
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
<id frequency="9">Cultures18-sub1-1</id>
<id frequency="9">Cultures18-sub1-2</id>
</detail>
<detail>
<id frequency="6">Cultures18-sub2-1</id>
<id frequency="6">Cultures18-sub2-2</id>
<id frequency="6">Cultures18-sub2-3</id>
</detail>
</details>
</source>
<!-- generates three rings of coloured wedges -->
<source>
<id frequency="18">Cultures18</id>
<details>
<detail>
<id frequency="9">Cultures18-sub1-1</id>
<id frequency="9">Cultures18-sub1-2</id>
</detail>
<detail>
<id frequency="6">Cultures18-sub2-1</id>
<id frequency="6">Cultures18-sub2-2</id>
<id frequency="6">Cultures18-sub2-3</id>
</detail>
<detail>
</detail>
</details>
</source>
Based on our meeting last week, I've drafted a proposal for the HCMC committee for the port of the project to a pure eXist implementation with enhanced searching, NLP topic discovery, etc. Sent to JL for comments.
At JS-R's request, dealt with the problem of owner names which were prefixed with "Transfer: " in the following way:
Created a new column for this info:
ALTER TABLE `owners` ADD COLUMN `own_diff_trans_name` BOOLEAN DEFAULT False NOT NULL AFTER `own_display_name`;
Checked how many rows would be affected (139):
SELECT * FROM `owners` WHERE LEFT(`own_display_name` , 10 ) = "Transfer: ";
Moved the info from the display name field to the new field:
UPDATE `owners` SET `own_diff_trans_name` = True WHERE LEFT( `own_display_name` , 10 ) = "Transfer: "; UPDATE `owners` SET `own_display_name` = REPLACE(`own_display_name`, "Transfer: ", "");
Checked the results:
SELECT * FROM `owners` WHERE `own_diff_trans_name` = True;
Work on tickets arising out of Michigan meeting.
Catching up after trip to Michigan...
415 emails to get through on my return from TEI Council meeting. Also completed expenses claim for trip.
Bio and keywords added (from HT).
Reviewed the situation with the ETCL site and the fact that none of the links work. Some generate a 404 error, others display a "What's New" page with banner and right column, but no page-specific content. Compared the production site UI and wp-admin settings with my dev site. Did a bunch of playing around with URLs and fiddling with configuration settings in the wp-admin in the dev site. Made no changes at all to the production site.
The problem seems to be an incompatibility between the WP3.1 environment (specifically the way links are handled) and code somewhere in the ETCL theme.
I will next compare the ETCL theme against the twentyten theme in the dev environment as the twentyten theme supports navigation that the ETCL theme doesn't.
I'm still not sure whether I should start with the ETCL theme and modify the php in it as needed to work with the WP3.1 API for links, (which is my preference) or whether I should start a generic theme and modify it to produce the look and feel similar to the current ETCL (except of course that the links would work).
So while reading the Apollodorus text I was looking at the events in section 3.10.2 and saw that the place xml:id=pylos was showing up on the site as "pylospylus"; "pylus" is the "AKA" for "pylos".
After discussing this with Greg we established that it's a coding error. The "placeName" is being pulled for both and seeing as "pylus" is under "placeName" too it's getting attached to "pylos". We looked at other places and saw that "corinth" and "ephyra" is doing the same thing because "ephyra" is under the "placeName" "AKA" too.
Greg is going to look into it and fix the coding.
One task from the meeting the other day was to create a new view of transactions which adds a number of financial fields. I've now done that, using the following SQL:
DROP VIEW IF EXISTS VW_trans_composite_eth_prop_2; CREATE VIEW VW_trans_composite_eth_prop_2 AS ( SELECT seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code, seller_titles.ttl_consideration AS seller_title_consideration, seller_titles.ttl_declaredvalue AS seller_title_declaredvalue, seller_titles.ttl_marketvalue AS seller_title_marketvalue, buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code, DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days, buyer_titles.ttl_consideration AS buyer_title_consideration, buyer_titles.ttl_declaredvalue AS buyer_title_declaredvalue, buyer_titles.ttl_marketvalue AS buyer_title_marketvalue, census_tracts.census_tract_code, props.*, sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers, sellers.munged_ethnicity AS seller_munged_eth, sellers.total_institutional AS institutional_sellers, buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers, buyers.munged_ethnicity AS buyer_munged_eth, buyers.total_institutional AS institutional_buyers FROM titles AS buyer_titles LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id LEFT JOIN props AS props ON seller_titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY seller_titles.ttl_title_id )
Made copies of a number of ta (tax assessment) files, renamed them and edited them to work with the building_permits table in the db. Did this in the dev instance in my account on our server.
Files I've added:
site_root/inc/bpformbasic.inc
site_root/inc/bpresults.inc
site_root/search/searchbp.php
The link to the searchbp.php page is on the tax assessment page (ta/taxassessment.php).
Emailed JL and PD for guidance on which fields to include in search interface(s) and which results fields are primary and which secondary.
More headaches with svn than the actual code (as usual). I had in the repo and on my local drive a folder (bp) containing 1 file. I did an svn delete and the file deleted on the local instance, but not the folder. When I then did a commit, I got a file out of date error. When I tried various ways of sorting out this problem I ended up consistently geeting "/path/on/local/drive/' remains in conflict" errors.
Googled it and discovered I'm not alone. Quite a number of people renamaing, moving or deleting files that have everything going smoothly except for one file or folder that some gets into "conflict"
Solution:
svn resolved path/to/conflicted folder
svn update path/to/conflicted folder
svn commit -m "resolving conflicted folder or whatever"
SA and I attended meeting with CW re: writing for the web.
Informative and worthwhile session. CW provided handout and will forward
power point presentation to us also for future reference.
Received from BAK requests to update website.
Have now updated their website with several new additions for:
Announcements, Courses (removed old, included new), new wording for index page content, additional title in left navigation.
More content forthcoming regarding announcements, new course description, and study abroad program.
New internship poster also included on website.
Emailed BK (cc'd SA)confirming updates made to website todate.
Had a directory with millions of symlinks. I needed to move the directory without the dependencies imposed by the symlinks. What I wanted was to 'convert' the symlinks to actual files - that is, replace the symlink with a copy of its target file. Found this, which worked a treat. Here's the actual code:
#!/bin/bash
for file in *;
do
link=$(readlink "${file}");
if [ "${link}" ]
then
rm "${file}";
cp -v "${link}" "${file}";
fi;
done;
User reported on the social sciences instance of Agenda that if you click on print timetable view, all that happens is that the view changes to list view. After some experimentation I noticed that if i click on timetable view, then click on Print Timetable view, the printable view appears in a new tab and the original tab goes to list view. If I then click on timetable view again and click on Print Timetable view again (with the second tab still there), then the second tab is updated but focus doesn't go there, so all I see is that the first tab goes from timetable view to list view. This is fairly conventional browser behaviour for dependent windows (usually popups).
Emailed user to get confirmation that's the problem. If so, I'll see what if anything I'm able to do to change the behaviour. It may not be necessary if she knows to just click on the second tab.
Created a building permits table in the vihdev db. Noticed that to auto-increment the building_permit_id field, you have to reference a sequence, so created the necessary sequence modelled on others I found in the db (census tables).
Processed the raw data file (spreadsheet) into normalized data (typed a couple of the data fields that I could e.g. int or date and normalized data to comply with the constraints I had established e.g. length of varchar fields). Saved that as a CSV (rather than tab-delimited) as the documentation seemed to favour the CSV approach.
Only substantial fiddling I had to do with the data was for all the records whose date field was only a year (e.g. 1889), I arbitrarily assigned them the 1st of January (e.g. 18890101) as the date field requires 8 digits.
Once that instance uploaded successfully, did the exact same thing in the production instance, just so I have a second copy of the thing somewhere.
Once that was all working in the dev instance of the db,
I've added new functionality to the stylesheet that builds the serOgraphies.xml file. As part of the information it gathers on each character, it also pulls out the beginning of each speech (<said>), and tells you what the nearest <rs> to that speech is. In other words, it calculates which epithet is most closely associated with each speech. This involved writing a couple of useful functions (hcmc:getTextOffsetBetweenTags and hcmc:getNearestRsTag) which will also be useful with the Map of London.
It's currently set to a distance limit of 250 characters (meaning letters/digits/glyphs, not literary characters), so that if there is no <rs> within 250 characters of the <said>, no epithet is returned. That distance can be changed easily in the stylesheet.
Still waiting for bio for this one, and keywords for both.
Met to discuss 3 proposals:
- Victorian Poetry Network (AC)
- Graves Diary Extension (EGW)
- 2-day TEI fundamentals workshop
Also discussed:
- write up policy on HCMC staff teaching
- incorporate that in larger HCMC mandate document
- submit both for review at next meeting
Finished process of processing files into tab delimited text files for upload into the mySQL DB, following process in the documents HowTo/HowToProcessColonistFiles.txt and HowTo/sql_for_load_data_file.txt files.
Modified HowToProcess document a bit to capture new conventions for encoding cemetary plot identifiers.
Modified sql_for_load_data_file as it had the field names in the wrong order (cemetary followed transcript rather than preceded it in the list of fields)
Uploaded data for 1927, 1928, 1929, 1930, 1931, 1932 into the database.
Reorganized folders containing data files and synchronized the files on the server and the files on my Mac so I have identical copies of the data files and the website on both machines.
For many projects it will be useful to have a way of calling a java lib which can make a universal similarity metric measurement of two strings. I've started working from this documentation to create a class and the necessary wrappers to make this work. I'm still trying to resolve some dependencies, but I think this will be practical, and we'll be able to use the USM module in the context of oXygen (where we're allowed to use Saxon EE). The testbed for this will be the matching of ContentDM records with our TEI metadata for maps.
That's the Rees review.
Saw a broken link error that's been in the History site report for a couple of weeks now, so I fixed it.
Added page references to all vol 19 documents (based on the journal volume, not the book).
Put together an immediate and a longer term plan for the project; I'll detail these when I have a chance.
This is a long review with a bibliography, so it's not finished yet...
Met with JS-R and had long discussions about the next phase of the project, and immediate needs. Arising out of this, two short-term tasks for me:
went to web coordinators meeting, chaired by Robin Sutherland.
Social Media policy : should be dynamic - i.e. answer user input quickly; no centralized control or process, they want people to try things and report to them; common sense on tone, content, management of user behaviour
Uvic.ca renovation : rejigging of the institutional site and subsequent rejigging of templates used for departmental sites; migrating of simple sites should be change of configuration values to point at new templates, css, and js files. Look somewhat different, feel more or less the same; using same DOM as current cms sites
Interesting task, directly relevant to some of the work we'll be doing on Mariage and Coldesp.
Dental appointment.
MF reported links to one map was broken. The actual image files (and pages that support them) were not on the site. She's going to try to find the images and if so will send them to me and I'll create the pages and post them to the site.
Layout problem with the list of maps. The css on the .citation specified clear all which caused those block to appear to the right of and below the nav bar rather than to the right. I changed the css rule to clear right and that solved the problem without introducing any problems I could find.
Replaced the contents of the trunk, Alex branch and the backup branch with the files as updated by Martin and Greg, so they're up to date.
Gave AD all the svn, web account and db connection info he should need to get to the files, check them out, and post them to the web space for testing.
Met with ER to go over state of the ecommerce testing for their site. Settled on the best kind of solution (i.e. Beanstream's inventory and shopping cart system addressed by pages on the Malahat site, rather than everything on the beanstream server (i.e. a complete custom-built store) or everything on the malahat server (again entails custom-built store and then code to transact the actual financial details).
At the meeting, still having problems with the hash-value validation with the shopping cart page. Eventually solved that by emailing RE who put me onto someone in the bookstore (they'd written their own custom store) who put me onto a BeanStream guy. The Beanstream guy was good and within half an hour problem was solved. There are two checkboxes in the admin interface "require hash value" and "include hash value". The first must be not checked and the second checked for use with the shopping cart. My guess is that if the site is being addressed directly from pages on the client's server (i.e. with code that injects the correct hashvalue in the correct place in the submission), then both those have to be checked - but I'm not sure about that.
JL came in with a new data set he wants to have incorporated into the VIHistory site. He provided a spreadsheet of building permits issued between 1860 and 1921. Initially all we'll do is create a new table and a user interface to query and report that table. Later we'll worry about more global searches (e.g. of street addresses) that go through this data set as well as any others.
He'd like it done by end of April, but I'm not sure I can make that deadline.
Leaving early.
Notetaking on the second video for week 4.
Today I've done a lot of regex work on the TPS files to do the following:
<persName> tags to <rs>.serOgraphies.xml for <rs> and <placeName> tags.I've also beefed up the output so that the stats for speeches are now properly encoded and identified, and I've supplied a CSS file which makes it possible to read the stats in a browser from the serOgraphies.xml file. Also helped KT with a bit of troubleshooting with a file versioning/rollback issue.
Entered today all primaries, secondaries and some sub-secondaries on the site.
Remainder of site in progress.
Sent email to BB, SB (GRS) cc'd SA advising what has been done to date.
BB, SB both attending Cascade training session next week.
I've done some preliminary alignment with XSLT to find out which maps we have which can be matched with entries from ContentDM:
It seems likely that many of these items actually do match, but because they have no Penfold numbers or matching ids, I'll have to match them with some sort of fuzzy matching approach.
I regenerated my map_lookup.xml file with a bit of added data:
xquery version "1.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
<maps xmlns="http://hcmc.uvic.ca">
{
for $t in //tei:TEI
return
<map xml:id="{$t/@xml:id}">
{
if ($t//tei:title) then
<title>{$t//tei:title[1]/text()}</title>
else
()
}
{
if ($t//tei:idno[@type="penfoldNum"]) then
(
<penfold>{$t//tei:idno[@type="penfoldNum"]/text()}</penfold>,
<docId>{$t//tei:idno[@type="doc_id"]/text()}</docId>
)
else
()
}
Completed the report for PCA, who signed off yesterday, and sent it on to SD and EG-W.
In a couple of hours late; stayed one hour late.
I've now added placename handling to the -ography generation code. In the process, I discovered that the same identifiers have been used for people and places in a number of cases. These will be disambiguated by KT, but in the meantime, I've added traps for them so that the XML file which is generated does not have id collisions. The -ography file is now generated from itself, and can preserve any descriptive data in e.g. <p> or <desc> tags which already exist in the file, while replacing the counts etc.
I've sent detailed instructions to KT on how to use search-and-replace to transition to the use of <rs> instead of persName, and to a single -ography file from the separate people and places files maintained previously.
Now I'm working on counting speeches and lengths of speeches for each character.
Four new correspondence documents from 1859 have been added to the correspondence, transcribed by Marion Massey and marked up by Petria Arienzale. The total document count is now 7151.
Going over to the bookstore to check the latest proof...
Built the complete P&A site structure down to the bottom level pages, and organized the navigation. Remaining to be done: additional-marketing and external-links.
I've written some simple XSLT to compile a file called serOgraphies.xml from the three input files KT says are basically ready. The entries look like this:
<item xml:id="Charles"> <rs n="1">Charles</rs> <rs n="3">Charles Gould</rs> <rs n="6">Don Carlos</rs> <rs n="2">Don Carlos Gould</rs> <rs n="1">Gould</rs> <rs n="4">Señor Administrador</rs> <rs n="2">Señor Administrador of the San Tomé Mine</rs> <rs n="1">their Señor Administrador</rs> </item>
The @n values are the counts of instances of that particular epithet, so "Charles" occurs once, "Charles Gould" occurs three times, and so on.
I found and fixed a few encoding errors and oddities in the transcription files at the same time.
This is generated from <persName> tags, but it's simple to change to <rs> tags, add <event>s, etc. It's likely that tagging in the text will shift to <rs> from <persName>, so that e.g. non-human characters such as animals can be accommodated.
I'm now doing most of this work in my free time, but I completed and submitted the third programming assignment this morning. Still managing to keep up and stay on schedule. :-)
Based on my meeting notes and some subsequent thinking, I've created a detailed plan for AT, for the Tarr markup, and sent it to him and KT.
week of Mar 26 - Mar 30
M 0, T +0.5 dean's request, W 0, R +0.5 hold fort, F +1.0 XSLT processing DevMS pages to text output on page
In late -- reno started at home.
Meeting with AT to discuss markup for Tarr. Made some preliminary decisions, which I'll summarize in an email tomorrow.
Added the first few new vessels to the vessels file, fixing some typos in the original transcription, confirming the existence and naming of the vessels, and finding some sources to get the researcher started. Lots more to do. I'm up to the John Stephenson.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | Current | > >> | ||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | |||||