While working on the import process to bring the XML data into the website database, it occurred to me that, after all of the changes to the schema, the <case> element doesn't serve much of a function anymore. While it used to group together multiple trial files, all of the useful grouping information (namely the trial dates) has been moved to the <trial_file> elements themselves. So, we don't need the extra level in the hierarchy.
For that matter, we also don't need the <regular_cases> element that wraps up the <case> elements, since it's also an extra level of hierarchy. The schema really just needs to be: data -> one or more trial files
Thus, I've removed <regular_cases> <case> from the schema and modified the XSLT transformation scripts accordingly. I also modified the data that's already been processed.
These changes will make the PHP script that converts the XML to MySQL simpler.
I've decided to scrap the eXist rebuild for a few reasons:
The XML schema was not meant as a final destination for the data. SA created it as an ad hoc intermediary step between SD's raw data and the MySQL database. So, the schema doesn't lend itself well to being the backbone of an eXist architecture, particularly when it comes to searching and generating human-readable values. The structure of the data is highly relational and is a good fit for SQL, which, of course, was SA's original intention.
That said, my original motivation for the eXist rebuild still stands. The current method of translating the XML to SQL involves using XSLT transformations to generate SQL statements. However, every time the schema is updated with new structure or values - which is happening a lot - the XSLT stylesheets need to be updated as well. So, to get around these difficulties, I'm going to write a PHP script that will convert the XML to MySQL, likely use SimpleXML. It won't be the fastest script I've ever written, but since it will be a one-use-per-dataset kind of thing, speed isn't a big issue.
After some consultation with Greg I've decided to scrap the PHP/MySQL version of Bailey that I built and write the site in eXist. After building FrancoToile and the Lansdowne Lecture site in eXist I'm comfortable working with it, and this way the Bailey XML data can be used directly without being shoehorned into a MySQL database. The allowed values for crimes, outcomes, judges, etc. is constantly changing in the schema, so maintaining those keys in MySQL would be a long-term pain. Using eXist cuts out the XML-to-MySQL middle man. I don't expect the basic functionality of the site to take long - it'll likely be done next week sometime.
Received the 1770-79 XLS spreadsheet from SD. Minor change to the format as he's added "Outcome Durn - Yrs" and "Outcome Durn - Other" to handle outcome duration values (see previous blog post: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=8333&more=1&c=1&tb=1&pb=1 ).
Simon Devereaux has approximately 10,000 records of people convicted in potentially capital cases between 1710 and 1840 in London heard at the Old Bailey court. This project will create a web-based database which will allow interested researchers and members of the public to compose queries on that data (e.g. women charged with robbery 1710-1720). It must be able to support a range of queries and produce output allowing researchers to identify trends in judicial practice over that time.
|<< <||Current||> >>|