Reconciled crimePardonOutcome_lists.txt, BaileyCapitalTrials.txt and recordstructure080301.xml so that they all agree on elements and structure. Possible questions:
- which of two crime groupings to use
- normalized respites and respite groups may be redundant, but having both would make respites consistent with crimes and outcomes
Still need to come up with 7 or so major categories for crimes, primarily to make the graphical visualization feasible. This bit of information would be in addition to the specific crime. Ultimately, the visualization tool will allow for presentation by crime-category, and then by crime within a selected category - if so, then each category needs no more than 7 to 9 members.
Still need a number of other bits and pieces on the graphical presentation.
Over past week have spent time improving the sophistication of the show/hide visualization:
- reports counts of visibles for each value in selected sortby category (age, sex, crime)
- reports total visibles and total invisibles
- allows user to hide select any checkbox in any column and retains proper colouring conventions (sortBy value) and visible/invisible counts
Still to be done:
- quantify axes
Improvements to consider:
- for each checkbox set: show/hide all, toggle all
- create sql query derived from current settings of visualization which will return the records currently visible in the visualization
- html bookmark the current state of the visualization page
Over past few days spent time fine-tuning the filter-based version (i.e. each button within a group is mutually exclusive) of the graphical representation of the Old Bailey court trials and creating a new hide-show-based version (each button is not mutually exclusive) of the same graphical information. The hide-show controls give the user finer control over what's displayed, is more flexible, easier to code and I suspect more intuitive to use, so will likely be the model I continue with.
Not sure yet how the whole thing will scale to thousands of records over 150 years, and whether to use js versus php with an Ajax call from the page.
updated the table and field specification based on input from Simon and based on Clifton's table structure. Will review the BaileyCapitalTrials.txt file with Clifton and then we can make necessary modifications to the mysql tables themselves.
Waiting on Simon to provide lists of normalized crimes, respites, and outcomes. We'll then either use data provided by Simon with that information or will generate some based on the 1790s records we have.
Under the direction of Stewart, Stewart and I reworked the Bailey project database to reflect a 1-N relationship with the recorder_report and the trial tables, instead of a N-N relationship. Therefore, in the case where 2 recorder reports exist for one trial the database will use one record to record both.
Within the database, report_start_date, from the recorder_report table, will contain the publishing date for that recorder report. If there happens to be a second recorder's report the report_end_date field will be used to recorder the second recorder report publication date. If the report_start_date and the report_end_date are the same then there is only one recorder report.
I attempted to create a php file that would read either a tab delimited text file or the .xml file produced by Stewart regarding the Bailey records to create sql statements but it proved to be way too difficult.
I worked with Martin earlier this week to develop a .xsl file that would interpret Stewart's .xml file and produce an .sql file. After cross checking the results with the original .xml I found that we were successful.
In order to further this project I will be creating a web application that will allow users to search through this data.
Clifton and I did some extreme programming to turn the XML into an SQL database using XSLT to produce the SQL. Worked first time, apparently (Clifton is checking the data integrity now).
After spending most of three or four days, with the occasional assistance of Greg and Clifton, I produced a well-formed xml structure and a tab delimited file derived from the original rtf file for the 1790s capital trials. There were 80 recorder's reports covering 757 trials.
About 35 of the recorder report lines have some content in the "miscellaneous text" element/field, so those will have to be edited by hand. At least one of those lines includes a reference to a second recorder's report.
About 250 of the trials have some content in the "miscellaneous text" element/field, so obviously those will need manual editing. I have no idea how many trials have syntactically OK but nonsensical values in fields.
I removed the list of "respited larcenies" and have not dealt with them yet.