Original website (i.e. Stew's version): http://web.uvic.ca/~lang02/bailey/
New development website (note: requires Netlink login): http://web.uvic.ca/~lang02/bailey_v2/
XML schema: http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng
The Data
The data has gone through a few schema changes, all documented on the blog. See, in chronological order (i.e. earliest first):
For each set of data that SD sends, I do the following:
Once I get the file back from SD, I do the following:
Validation
As noted at the top of this document, the schema file is at http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng . SD validates against this file so it always needs to be current.
The Website
The development website ( http://web.uvic.ca/~lang02/bailey_v2/ ) is a feature-complete, and mostly design-complete (I went for a muted look) HTML5-validated website. The site employs Jquery-driven Javascript to handle AJAX requests.
On the search form, the Jquery plugin bsmSelect is used to make multiple select fields more user-friendly. The plugin homepage is here: http://plugins.jquery.com/project/bsmSelect , and the blog post is here: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=7864&more=1&c=1&tb=1&pb=1
The site search – both the form and the results – is handled by the Search suite of classes located at includes/classes (Search.php and the Search subdirectory). The Search_Form class, located at Search/Form.php, is responsible for displaying all of the search fields and populating them with the proper values. All of the classes are fully documented.
The site uses the Zend Framework's DB library for database interaction. The classes are located at includes/classes/Zend . All of the classes used in the site follow the PEAR class naming convention for easy autoloading. The PEAR convention dictates the following:
Each subdirectory that a class is in should be represented in the name, with single underscores separating each level of the hierarchy. For example, if a class lives in Search/Form.php, its class name would be Search_Form . Similarly, if a class lives in My/Name/Is/Bob.php, its class name would be My_Name_Is_Bob . This convention has two primary benefits:
The search results pages – both the results table and a single record – use the Model_ series of classes to display information, as well as either Search_Result (for the results summary) or Search_Single (for a detailed record view). Each row of a table (trials, trial_files, etc.) is represented by a class, which allows for easy placement of formatting functions and the like.
The Chart / Data Visualization
The chart to visualize results uses Flot, a Jquery-based plotting tool:
http://code.google.com/p/flot/
To accomplish the 'stacking' of the results, I used the stacking plugin, which is included wtih Flot:
http://people.iola.dk/olau/flot/examples/stacking.html
The Flot chart gets its information from a dynamic JSON dataset, which is based on a submitted search form. When a search is submitted, it's stored in the user's session for easy re-use on either the search page (to modify results) or the chart page. This is also how the “passing” of data from the chart to the results table (and vice versa) is accomplished – both just read and parse any search information in the session. The results themselves are not saved in the session, just the search query parameters.
The Flot initialization code is in js/init-flot.js – it's almost all 'stock' and taken from the examples, with the minor exception of a modification I had to make to get the totals for each year display under the year name in on the X axis of the chart.
I'm overall very happy with Flot, as is SD. It's fairly simple, decently fast, and, unlike some alternatives, still under active development.
Because unix7.uvic.ca can't handle dates earlier than Dec 14, 1901, the proprietary date format in the XML files - currently 1759Dec31 - will need to be changed to a standard YYYY-MM-DD format, i.e. 1759-12-31. The XML -> MySQL import script can't do any date processing, so the date needs to be in the proper format in the original data.
The unix.uvic.ca machines - at least, the ones I've used - have a 32-bit architecture. Because of this, PHP's timestamp handling is limited from Fri, 13 Dec 1901 20:45:54 GMT to Tue, 19 Jan 2038 03:14:07 GMT (see http://uk3.php.net/manual/en/function.date.php ). 64-bit architecture allows for basically limitless timestamps.
This limitation poses a problem because the Bailey data importer deals exclusively with dates before 1901.
UPDATE table SET field = !field
I've decided to scrap the eXist rebuild for a few reasons:
The XML schema was not meant as a final destination for the data. SA created it as an ad hoc intermediary step between SD's raw data and the MySQL database. So, the schema doesn't lend itself well to being the backbone of an eXist architecture, particularly when it comes to searching and generating human-readable values. The structure of the data is highly relational and is a good fit for SQL, which, of course, was SA's original intention.
That said, my original motivation for the eXist rebuild still stands. The current method of translating the XML to SQL involves using XSLT transformations to generate SQL statements. However, every time the schema is updated with new structure or values - which is happening a lot - the XSLT stylesheets need to be updated as well. So, to get around these difficulties, I'm going to write a PHP script that will convert the XML to MySQL, likely use SimpleXML. It won't be the fastest script I've ever written, but since it will be a one-use-per-dataset kind of thing, speed isn't a big issue.
After some consultation with Greg I've decided to scrap the PHP/MySQL version of Bailey that I built and write the site in eXist. After building FrancoToile and the Lansdowne Lecture site in eXist I'm comfortable working with it, and this way the Bailey XML data can be used directly without being shoehorned into a MySQL database. The allowed values for crimes, outcomes, judges, etc. is constantly changing in the schema, so maintaining those keys in MySQL would be a long-term pain. Using eXist cuts out the XML-to-MySQL middle man. I don't expect the basic functionality of the site to take long - it'll likely be done next week sometime.
Received the 1770-79 XLS spreadsheet from SD. Minor change to the format as he's added "Outcome Durn - Yrs" and "Outcome Durn - Other" to handle outcome duration values (see previous blog post: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=8333&more=1&c=1&tb=1&pb=1 ).
:: Next Page >>
Simon Devereaux has approximately 10,000 records of people convicted in potentially capital cases between 1710 and 1840 in London heard at the Old Bailey court. This project will create a web-based database which will allow interested researchers and members of the public to compose queries on that data (e.g. women charged with robbery 1710-1720). It must be able to support a range of queries and produce output allowing researchers to identify trends in judicial practice over that time.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | > >> | |||||
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | ||||||