Original website (i.e. Stew's version): http://web.uvic.ca/~lang02/bailey/
New development website (note: requires Netlink login): http://web.uvic.ca/~lang02/bailey_v2/
The data has gone through a few schema changes, all documented on the blog. See, in chronological order (i.e. earliest first):
For each set of data that SD sends, I do the following:
Once I get the file back from SD, I do the following:
As noted at the top of this document, the schema file is at http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng . SD validates against this file so it always needs to be current.
On the search form, the Jquery plugin bsmSelect is used to make multiple select fields more user-friendly. The plugin homepage is here: http://plugins.jquery.com/project/bsmSelect , and the blog post is here: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=7864&more=1&c=1&tb=1&pb=1
The site search – both the form and the results – is handled by the Search suite of classes located at includes/classes (Search.php and the Search subdirectory). The Search_Form class, located at Search/Form.php, is responsible for displaying all of the search fields and populating them with the proper values. All of the classes are fully documented.
The site uses the Zend Framework's DB library for database interaction. The classes are located at includes/classes/Zend . All of the classes used in the site follow the PEAR class naming convention for easy autoloading. The PEAR convention dictates the following:
Each subdirectory that a class is in should be represented in the name, with single underscores separating each level of the hierarchy. For example, if a class lives in Search/Form.php, its class name would be Search_Form . Similarly, if a class lives in My/Name/Is/Bob.php, its class name would be My_Name_Is_Bob . This convention has two primary benefits:
The search results pages – both the results table and a single record – use the Model_ series of classes to display information, as well as either Search_Result (for the results summary) or Search_Single (for a detailed record view). Each row of a table (trials, trial_files, etc.) is represented by a class, which allows for easy placement of formatting functions and the like.
The Chart / Data Visualization
The chart to visualize results uses Flot, a Jquery-based plotting tool:
To accomplish the 'stacking' of the results, I used the stacking plugin, which is included wtih Flot:
The Flot chart gets its information from a dynamic JSON dataset, which is based on a submitted search form. When a search is submitted, it's stored in the user's session for easy re-use on either the search page (to modify results) or the chart page. This is also how the “passing” of data from the chart to the results table (and vice versa) is accomplished – both just read and parse any search information in the session. The results themselves are not saved in the session, just the search query parameters.
The Flot initialization code is in js/init-flot.js – it's almost all 'stock' and taken from the examples, with the minor exception of a modification I had to make to get the totals for each year display under the year name in on the X axis of the chart.
I'm overall very happy with Flot, as is SD. It's fairly simple, decently fast, and, unlike some alternatives, still under active development.
[SA 120116 made changes affecting what's talked about in this post, see http://hcmc.uvic.ca/blogs/index.php?blog=36&p=9010&more=1&c=1&tb=1&pb=1]
Met with SD yesterday primarily to discuss two things: how to link together criminals from different trials that are the same person, and how to link trials in different trial_files to each other (i.e. for a bunch of criminals tried for the same crime). We determined that the simplest, quickest, and therefore most prudent way to do this is to use an "id" attribute to link together the elements as required. So, if two <criminal> elements both describe the same person, then each <criminal> element in question should look like this:
<criminal id="myuniqueidentifier"> (stuff) </criminal>
where "myuniqueidentifir" can be any string as long as it's the same for each criminal that should be treated as the same person. <criminal> elements that don't correspond to any other <criminal> element do <strong>not</strong> need an id attribute.
Similarly, for <trial> elements that refer to the same trial (but a different criminal):
<trial rel="5"> (stuff) </trial>
In the case of trials, 'rel' does not have to be absolutely unique, just unique within one court session (i.e the same trial_file_start_date and trial_file_end_date values). SD requested this to make it easier to keep track of the numbers.
I will be able to alter the import script to account for these new attributes, but I'm not sure if SD will have a chance to go over all of the data before my contract is up.
In my previous post about making francotoile.uvic.ca work, I outlined writing a VirtualHost configuration that could successfully rewrite a pear.hcmc.uvic.ca Tomcat URL to francotoile.uvic.ca. It seems, however, that with this configuration, HTTP session information isn't transmitted between Tomcat and Apache. In other words, when using the URL rewritten with the VirtualHost, a new session is created every page load and does not persist. This does not happen when using the original URL (the pear.hcmc.uvic.ca one).
I had hoped to finish this post off with "to fix this, just add...", but, sadly, I've yet to find a solution. Martin has pointed out that both http://mariage.uvic.ca/ and http://bcgenesis.uvic.ca/ , which also use mod_rewrite and mod_jk, handle sessions correctly.
Because unix7.uvic.ca can't handle dates earlier than Dec 14, 1901, the proprietary date format in the XML files - currently 1759Dec31 - will need to be changed to a standard YYYY-MM-DD format, i.e. 1759-12-31. The XML -> MySQL import script can't do any date processing, so the date needs to be in the proper format in the original data.
The unix.uvic.ca machines - at least, the ones I've used - have a 32-bit architecture. Because of this, PHP's timestamp handling is limited from Fri, 13 Dec 1901 20:45:54 GMT to Tue, 19 Jan 2038 03:14:07 GMT (see http://uk3.php.net/manual/en/function.date.php ). 64-bit architecture allows for basically limitless timestamps.
This limitation poses a problem because the Bailey data importer deals exclusively with dates before 1901.
UPDATE table SET field = !field
I've finished writing and testing a PHP script to import an XML file (validated against http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng ) into the MySQL database. The script using SimpleXMLElement to load and loop through the file. The database handling is done by Zend_Db, since that's already being used in the main Bailey website.
I wrote the importer in such a way that new foreign keys (e.g. crime_normalizeds, judges, etc.) don't need to be entered into the MySQL database beforehand. Since the XML is validated against the schema, I can be reasonably certain that all of the values in the data are intentional. So, if the importer comes across a value that's not already in the MySQL database (say, a new judge), it simply inserts that value and continues with the import.
The importer isn't yet online, but once it is I'll publish its URL to this blog.
Simon Devereaux has approximately 10,000 records of people convicted in potentially capital cases between 1710 and 1840 in London heard at the Old Bailey court. This project will create a web-based database which will allow interested researchers and members of the public to compose queries on that data (e.g. women charged with robbery 1710-1720). It must be able to support a range of queries and produce output allowing researchers to identify trends in judicial practice over that time.
|<< <||Current||> >>|