Category: Notes

31/08/11

Permalink 01:00:58 pm, by jamie, 992 words, 238 views   English (CA)
Categories: Notes; Mins. worked: 0

Notes upon my departure

Original website (i.e. Stew's version): http://web.uvic.ca/~lang02/bailey/

New development website (note: requires Netlink login): http://web.uvic.ca/~lang02/bailey_v2/

XML schema: http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng

The Data

The data has gone through a few schema changes, all documented on the blog. See, in chronological order (i.e. earliest first):

For each set of data that SD sends, I do the following:

  1. Make a new folder in BaileyCapitalTrials/data/ named after the year range for the data, e.g.: 1730-39

  2. Put SD's XSL file in that folder

  3. Follow the import steps to convert the XSL to XML, saving all of the step_*.xml files in the newly created folder: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=8283&more=1&c=1&tb=1&pb=1

  4. Make a copy of step_3.xml and save it as (using 1730-39 as an example) 1730_1739_for_proofing.xml

  5. Send that file to SD for proofing

  6. Post on the blog that the data was received, processed, and sent for proofing

Once I get the file back from SD, I do the following:

  1. Save as 1730_1739_proofed_by_simon.xml in the proper folder (see above)

  2. Validate the file in Oxygen and fix any errors. There shouldn't be at this point, but SD lets the odd typo slip through the cracks.

  3. Make a copy, name it 1730-1739.xml and put it in
    BaileyCapitalTrials/data/finished_data/

  4. Once the file is 100% valid, import into the website: http://web.uvic.ca/~lang02/bailey_v2/import/index.php

  5. Post on the blog that the data was validated

Validation

As noted at the top of this document, the schema file is at http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng . SD validates against this file so it always needs to be current.

The Website

The development website ( http://web.uvic.ca/~lang02/bailey_v2/ ) is a feature-complete, and mostly design-complete (I went for a muted look) HTML5-validated website. The site employs Jquery-driven Javascript to handle AJAX requests.

On the search form, the Jquery plugin bsmSelect is used to make multiple select fields more user-friendly. The plugin homepage is here: http://plugins.jquery.com/project/bsmSelect , and the blog post is here: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=7864&more=1&c=1&tb=1&pb=1

The site search – both the form and the results – is handled by the Search suite of classes located at includes/classes (Search.php and the Search subdirectory). The Search_Form class, located at Search/Form.php, is responsible for displaying all of the search fields and populating them with the proper values. All of the classes are fully documented.

The site uses the Zend Framework's DB library for database interaction. The classes are located at includes/classes/Zend . All of the classes used in the site follow the PEAR class naming convention for easy autoloading. The PEAR convention dictates the following:

  • Class names should begin with an uppercase letter
  • Each file should only contain one class
  • Each subdirectory that a class is in should be represented in the name, with single underscores separating each level of the hierarchy. For example, if a class lives in Search/Form.php, its class name would be Search_Form . Similarly, if a class lives in My/Name/Is/Bob.php, its class name would be My_Name_Is_Bob . This convention has two primary benefits:

    • Locating a class file is easy (just need to know the name of the class and the base class include directory)
    • Autoloading classes is simple (the autoloader simply needs to replace underscores with directory separators)

The search results pages – both the results table and a single record – use the Model_ series of classes to display information, as well as either Search_Result (for the results summary) or Search_Single (for a detailed record view). Each row of a table (trials, trial_files, etc.) is represented by a class, which allows for easy placement of formatting functions and the like.

The Chart / Data Visualization

The chart to visualize results uses Flot, a Jquery-based plotting tool:

http://code.google.com/p/flot/

To accomplish the 'stacking' of the results, I used the stacking plugin, which is included wtih Flot:

http://people.iola.dk/olau/flot/examples/stacking.html

The Flot chart gets its information from a dynamic JSON dataset, which is based on a submitted search form. When a search is submitted, it's stored in the user's session for easy re-use on either the search page (to modify results) or the chart page. This is also how the “passing” of data from the chart to the results table (and vice versa) is accomplished – both just read and parse any search information in the session. The results themselves are not saved in the session, just the search query parameters.

The Flot initialization code is in js/init-flot.js – it's almost all 'stock' and taken from the examples, with the minor exception of a modification I had to make to get the totals for each year display under the year name in on the X axis of the chart.

I'm overall very happy with Flot, as is SD. It's fairly simple, decently fast, and, unlike some alternatives, still under active development.

15/08/11

Permalink 12:48:22 pm, by jamie, 61 words, 102 views   English (CA)
Categories: Notes; Mins. worked: 0

Change to schema - date format

Because unix7.uvic.ca can't handle dates earlier than Dec 14, 1901, the proprietary date format in the XML files - currently 1759Dec31 - will need to be changed to a standard YYYY-MM-DD format, i.e. 1759-12-31. The XML -> MySQL import script can't do any date processing, so the date needs to be in the proper format in the original data.

Permalink 12:04:28 pm, by jamie, 65 words, 118 views   English (CA)
Categories: Notes; Mins. worked: 0

unix.uvic.ca can't generate timestamps before Dec 14, 1901

The unix.uvic.ca machines - at least, the ones I've used - have a 32-bit architecture. Because of this, PHP's timestamp handling is limited from Fri, 13 Dec 1901 20:45:54 GMT to Tue, 19 Jan 2038 03:14:07 GMT (see http://uk3.php.net/manual/en/function.date.php ). 64-bit architecture allows for basically limitless timestamps.

This limitation poses a problem because the Bailey data importer deals exclusively with dates before 1901.

05/08/11

Permalink 04:09:44 pm, by jamie, 31 words, 231 views   English (CA)
Categories: Notes; Mins. worked: 0

How to toggle a boolean or tinyint field in MySQL

If you have a boolean or tinyint(1) field in MySQL and you need to toggle the value without knowing in advance what the value currently is:

UPDATE table SET field = !field

26/07/11

Permalink 09:12:44 am, by jamie, 8 words, 133 views   English (CA)
Categories: Notes; Mins. worked: 0

1760-69 proofed

Received the proofed 1760-69 data from SD, 100% valid.

25/07/11

Permalink 03:19:16 pm, by jamie, 191 words, 136 views   English (CA)
Categories: Notes; Mins. worked: 0

eXist rebuild cancelled: PHP version reinstated with planned XML import

I've decided to scrap the eXist rebuild for a few reasons:

The XML schema was not meant as a final destination for the data. SA created it as an ad hoc intermediary step between SD's raw data and the MySQL database. So, the schema doesn't lend itself well to being the backbone of an eXist architecture, particularly when it comes to searching and generating human-readable values. The structure of the data is highly relational and is a good fit for SQL, which, of course, was SA's original intention.

That said, my original motivation for the eXist rebuild still stands. The current method of translating the XML to SQL involves using XSLT transformations to generate SQL statements. However, every time the schema is updated with new structure or values - which is happening a lot - the XSLT stylesheets need to be updated as well. So, to get around these difficulties, I'm going to write a PHP script that will convert the XML to MySQL, likely use SimpleXML. It won't be the fastest script I've ever written, but since it will be a one-use-per-dataset kind of thing, speed isn't a big issue.

20/07/11

Permalink 12:40:55 pm, by jamie, 110 words, 132 views   English (CA)
Categories: Notes; Mins. worked: 0

eXist rebuild planned

After some consultation with Greg I've decided to scrap the PHP/MySQL version of Bailey that I built and write the site in eXist. After building FrancoToile and the Lansdowne Lecture site in eXist I'm comfortable working with it, and this way the Bailey XML data can be used directly without being shoehorned into a MySQL database. The allowed values for crimes, outcomes, judges, etc. is constantly changing in the schema, so maintaining those keys in MySQL would be a long-term pain. Using eXist cuts out the XML-to-MySQL middle man. I don't expect the basic functionality of the site to take long - it'll likely be done next week sometime.

13/07/11

Permalink 11:10:47 am, by jamie, 28 words, 1141 views   English (CA)
Categories: Notes; Mins. worked: 0

Beginning restructuring of SQL schema

Because of the laundry list of changes to the XML structure, I've begun a wholesale restructuring of the MySQL database schema. There are quite a few core changes.
Permalink 09:56:32 am, by jamie, 12 words, 1141 views   English (CA)
Categories: Notes; Mins. worked: 0

1770-79 data proofed and received

SD sent back the 1770-79 XML file this morning, which is 100% valid.

12/07/11

Permalink 01:19:35 pm, by jamie, 51 words, 1148 views   English (CA)
Categories: Notes; Mins. worked: 0

1770-90 data received; change in spreadsheet format

Received the 1770-79 XLS spreadsheet from SD. Minor change to the format as he's added "Outcome Durn - Yrs" and "Outcome Durn - Other" to handle outcome duration values (see previous blog post: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=8333&more=1&c=1&tb=1&pb=1 ).

:: Next Page >>

Capital Trials at the Old Bailey

Simon Devereaux has approximately 10,000 records of people convicted in potentially capital cases between 1710 and 1840 in London heard at the Old Bailey court. This project will create a web-based database which will allow interested researchers and members of the public to compose queries on that data (e.g. women charged with robbery 1710-1720). It must be able to support a range of queries and produce output allowing researchers to identify trends in judicial practice over that time.

Reports

Categories

May 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

XML Feeds