31/08/11

Permalink 01:00:58 pm, by jamie, 992 words, 244 views   English (CA)
Categories: Notes; Mins. worked: 0

Notes upon my departure

Original website (i.e. Stew's version): http://web.uvic.ca/~lang02/bailey/

New development website (note: requires Netlink login): http://web.uvic.ca/~lang02/bailey_v2/

XML schema: http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng

The Data

The data has gone through a few schema changes, all documented on the blog. See, in chronological order (i.e. earliest first):

For each set of data that SD sends, I do the following:

  1. Make a new folder in BaileyCapitalTrials/data/ named after the year range for the data, e.g.: 1730-39

  2. Put SD's XSL file in that folder

  3. Follow the import steps to convert the XSL to XML, saving all of the step_*.xml files in the newly created folder: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=8283&more=1&c=1&tb=1&pb=1

  4. Make a copy of step_3.xml and save it as (using 1730-39 as an example) 1730_1739_for_proofing.xml

  5. Send that file to SD for proofing

  6. Post on the blog that the data was received, processed, and sent for proofing

Once I get the file back from SD, I do the following:

  1. Save as 1730_1739_proofed_by_simon.xml in the proper folder (see above)

  2. Validate the file in Oxygen and fix any errors. There shouldn't be at this point, but SD lets the odd typo slip through the cracks.

  3. Make a copy, name it 1730-1739.xml and put it in
    BaileyCapitalTrials/data/finished_data/

  4. Once the file is 100% valid, import into the website: http://web.uvic.ca/~lang02/bailey_v2/import/index.php

  5. Post on the blog that the data was validated

Validation

As noted at the top of this document, the schema file is at http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng . SD validates against this file so it always needs to be current.

The Website

The development website ( http://web.uvic.ca/~lang02/bailey_v2/ ) is a feature-complete, and mostly design-complete (I went for a muted look) HTML5-validated website. The site employs Jquery-driven Javascript to handle AJAX requests.

On the search form, the Jquery plugin bsmSelect is used to make multiple select fields more user-friendly. The plugin homepage is here: http://plugins.jquery.com/project/bsmSelect , and the blog post is here: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=7864&more=1&c=1&tb=1&pb=1

The site search – both the form and the results – is handled by the Search suite of classes located at includes/classes (Search.php and the Search subdirectory). The Search_Form class, located at Search/Form.php, is responsible for displaying all of the search fields and populating them with the proper values. All of the classes are fully documented.

The site uses the Zend Framework's DB library for database interaction. The classes are located at includes/classes/Zend . All of the classes used in the site follow the PEAR class naming convention for easy autoloading. The PEAR convention dictates the following:

  • Class names should begin with an uppercase letter
  • Each file should only contain one class
  • Each subdirectory that a class is in should be represented in the name, with single underscores separating each level of the hierarchy. For example, if a class lives in Search/Form.php, its class name would be Search_Form . Similarly, if a class lives in My/Name/Is/Bob.php, its class name would be My_Name_Is_Bob . This convention has two primary benefits:

    • Locating a class file is easy (just need to know the name of the class and the base class include directory)
    • Autoloading classes is simple (the autoloader simply needs to replace underscores with directory separators)

The search results pages – both the results table and a single record – use the Model_ series of classes to display information, as well as either Search_Result (for the results summary) or Search_Single (for a detailed record view). Each row of a table (trials, trial_files, etc.) is represented by a class, which allows for easy placement of formatting functions and the like.

The Chart / Data Visualization

The chart to visualize results uses Flot, a Jquery-based plotting tool:

http://code.google.com/p/flot/

To accomplish the 'stacking' of the results, I used the stacking plugin, which is included wtih Flot:

http://people.iola.dk/olau/flot/examples/stacking.html

The Flot chart gets its information from a dynamic JSON dataset, which is based on a submitted search form. When a search is submitted, it's stored in the user's session for easy re-use on either the search page (to modify results) or the chart page. This is also how the “passing” of data from the chart to the results table (and vice versa) is accomplished – both just read and parse any search information in the session. The results themselves are not saved in the session, just the search query parameters.

The Flot initialization code is in js/init-flot.js – it's almost all 'stock' and taken from the examples, with the minor exception of a modification I had to make to get the totals for each year display under the year name in on the X axis of the chart.

I'm overall very happy with Flot, as is SD. It's fairly simple, decently fast, and, unlike some alternatives, still under active development.

25/08/11

Permalink 10:23:56 am, by jamie, 291 words, 275 views   English (CA)
Categories: Activity Log; Mins. worked: 60

Meeting with SD; schema modifications

[SA 120116 made changes affecting what's talked about in this post, see http://hcmc.uvic.ca/blogs/index.php?blog=36&p=9010&more=1&c=1&tb=1&pb=1]

Met with SD yesterday primarily to discuss two things: how to link together criminals from different trials that are the same person, and how to link trials in different trial_files to each other (i.e. for a bunch of criminals tried for the same crime). We determined that the simplest, quickest, and therefore most prudent way to do this is to use an "id" attribute to link together the elements as required. So, if two <criminal> elements both describe the same person, then each <criminal> element in question should look like this:


<criminal id="myuniqueidentifier">
(stuff)
</criminal>

where "myuniqueidentifir" can be any string as long as it's the same for each criminal that should be treated as the same person. <criminal> elements that don't correspond to any other <criminal> element do <strong>not</strong> need an id attribute.

Similarly, for <trial> elements that refer to the same trial (but a different criminal):


<trial rel="5">
(stuff)
</trial>

In the case of trials, 'rel' does not have to be absolutely unique, just unique within one court session (i.e the same trial_file_start_date and trial_file_end_date values). SD requested this to make it easier to keep track of the numbers.

I will be able to alter the import script to account for these new attributes, but I'm not sure if SD will have a chance to go over all of the data before my contract is up.

23/08/11

Permalink 09:15:02 am, by jamie, 16 words, 97 views   English (CA)
Categories: Activity Log; Mins. worked: 20

1930-39 data received, processed, sent for proofing

Received and processed the 1930-39 data sent by SD yesterday, and sent back for final proofing.

18/08/11

Permalink 01:51:17 pm, by jamie, 222 words, 156 views   English (CA)
Categories: Activity Log; Mins. worked: 180

Losing HTTP sessions when using Apache, Tomcat, mod_rewrite, and mod_jk

In my previous post about making francotoile.uvic.ca work, I outlined writing a VirtualHost configuration that could successfully rewrite a pear.hcmc.uvic.ca Tomcat URL to francotoile.uvic.ca. It seems, however, that with this configuration, HTTP session information isn't transmitted between Tomcat and Apache. In other words, when using the URL rewritten with the VirtualHost, a new session is created every page load and does not persist. This does not happen when using the original URL (the pear.hcmc.uvic.ca one).

I had hoped to finish this post off with "to fix this, just add...", but, sadly, I've yet to find a solution. Martin has pointed out that both http://mariage.uvic.ca/ and http://bcgenesis.uvic.ca/ , which also use mod_rewrite and mod_jk, handle sessions correctly.

It turns out that both of those sites use cookies, not sessions. Since a cookie is a natural choice in this case anyway - a language preference for the site - I've changed the code in my i18n eXist plugin to use cookies instead of sessions, which does the trick. Martin has suggested to use both in tandem, so that the system tries to use sessions but falls back on a cookie if session are unavailable. I will try to implement this enhancement before my time is up.

15/08/11

Permalink 04:25:27 pm, by jamie, 64 words, 85 views   English (CA)
Categories: Activity Log; Mins. worked: 60

Updated schema; wrote XSLT to transform old dates

Updated the schema to use the new standard date format (YYYY-MM-DD) in all date fields, namely trial_file_start_date, trial_file_end_date, respite_rr_date, and outcome_date. I wrote an XSLT transformation to convert the dates in the otherwise-valid data files to the new format. I also modified the PHP import script so that it no longer does any date conversion.
Permalink 12:48:22 pm, by jamie, 61 words, 107 views   English (CA)
Categories: Notes; Mins. worked: 0

Change to schema - date format

Because unix7.uvic.ca can't handle dates earlier than Dec 14, 1901, the proprietary date format in the XML files - currently 1759Dec31 - will need to be changed to a standard YYYY-MM-DD format, i.e. 1759-12-31. The XML -> MySQL import script can't do any date processing, so the date needs to be in the proper format in the original data.

Permalink 12:04:28 pm, by jamie, 65 words, 124 views   English (CA)
Categories: Notes; Mins. worked: 0

unix.uvic.ca can't generate timestamps before Dec 14, 1901

The unix.uvic.ca machines - at least, the ones I've used - have a 32-bit architecture. Because of this, PHP's timestamp handling is limited from Fri, 13 Dec 1901 20:45:54 GMT to Tue, 19 Jan 2038 03:14:07 GMT (see http://uk3.php.net/manual/en/function.date.php ). 64-bit architecture allows for basically limitless timestamps.

This limitation poses a problem because the Bailey data importer deals exclusively with dates before 1901.

12/08/11

Permalink 09:49:15 am, by jamie, 33 words, 106 views   English (CA)
Categories: Activity Log; Mins. worked: 5

Added LDAP Netlink auth to the Bailey dev site

At the request of SD, added a limited-access Netlink login to the Bailey dev site ( http://web.uvic.ca/~lang02/bailey_v2/ ). The allowed users are the usual suspects at HCMC and SD.
Permalink 09:41:28 am, by jamie, 27 words, 102 views   English (CA)
Categories: Activity Log; Mins. worked: 20

1740-49 data received, converted, sent for proofing

Received the 1740-49 XLS file from SD last night. Converted it to XML this morning, fixed some validation errors, and sent back to SD for final proofing.

05/08/11

Permalink 04:09:44 pm, by jamie, 31 words, 245 views   English (CA)
Categories: Notes; Mins. worked: 0

How to toggle a boolean or tinyint field in MySQL

If you have a boolean or tinyint(1) field in MySQL and you need to toggle the value without knowing in advance what the value currently is:

UPDATE table SET field = !field

<< Previous Page :: Next Page >>

Capital Trials at the Old Bailey

Simon Devereaux has approximately 10,000 records of people convicted in potentially capital cases between 1710 and 1840 in London heard at the Old Bailey court. This project will create a web-based database which will allow interested researchers and members of the public to compose queries on that data (e.g. women charged with robbery 1710-1720). It must be able to support a range of queries and produce output allowing researchers to identify trends in judicial practice over that time.

Reports

Categories

June 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

XML Feeds