Met with SD to confirm his categorization. He will manually change instances of "capital theft" and "theft" to "St in dwelling" or "Robbery in Dwelling" as needed. Any remaining instances of "capital theft" or "theft" will go into the MISCELLANEOUS category. Instances of "extortion" will go into the CURRENCY category
Arson : MISCELLANEOUS
Burglary : BURGLARY
Cattle : ANIMAL THEFT
Coining : CURRENCY
Extortion : CURRENCY
Forgery : CURRENCY
Horse : ANIMAL THEFT
Housebreak : BURGLARY
Impersonation : CURRENCY
Murder : MURDER
Pickpocket : MISCELLANEOUS
Rape : SEXUAL ASSAULT
Returning : MISCELLANEOUS
Riot : MISCELLANEOUS
Rob in Dwelling : ROBBERY
Robbery : ROBBERY
Sheep : ANIMAL THEFT
Shoplift : ST IN SHOP
Shooting : MISCELLANEOUS
Sodomy : SEXUAL ASSAULT
St in Dwelling : ST In DWELLING
St in Mail : MISCELLANEOUS
St on Thames : MISCELLANEOUS
[other values] : NOTKNOWN - to be manually edited by proofer
Over past couple of weeks, I've been doing further tests with three types of visualizations of the Bailey data:
1) Frequency (each trial gets a box, x-axis is years, y-axis is number of trials, user can colour or hide/show trials by a range of attributes)
2) Motion (trials aggregated by year, primary key is one attribute (e.g. crime), rest of fields are numeric counts or percentages (e.g. %male, %age_under_19)) then put into display box where trends seen by position and movement of circles representing each of primary attribute values.
3) Clumping (similar data as for motion, but instead of axes being %age_under_19 and %male, one axis would be age with points for under_20, 20to40, 40+ and the other axis would be gender with one point for male and one for female. The area of the circle at each intersection would correspond to the number of trials in that year with the appropriate values for the two axes.
#1 will be the primary visualization as it provides both immediate access to individual trials and indicator of trends over the set. #2 will be secondary visualization and may appeal to people whose learning style is more synthetic/right brain and less analytic/left brain. #3 doesn't get me anything I can't get from #2 and would be a huge amount of work to create the display engine.
Created a tab-delimited file of the records I've been using for my frequency visualization. Each of those has a unique id in the first field, so of course no trends could be presented by the software, but I could get a snapshot of the distribution of trials on the selected variables at each year.
Then decided to focus on crimes, so created a table with 1 of the 8 types of crimes in the first field of each record, then repeat for the ten years in the sample set. Then I had to aggregate the number of e.g. murders in 1790, and then work out the percentage of that number which had male for the suspect's gender, L for the jury code, age<20, age21-40, age40+ etc.
So, you can no longer get at each individual trial, but only at the group of trial which match the up to 4 attributes (2 axes, colour, size of ball) that you can filter by. That may be 1 or it may be more than 1 in a given year. On the other hand, you can now see trends e.g. for each crime, what the gender breakdown is for each year over time and how that's trending.
I'm not sure if the kind of aggregating required to get the trends will make the data useless or not yet.
Met with SD to confirm details of SSHRC grant and have initial discussion.
SD has $87,000 over 3 years (28k, 28k, 31k) to fund research (including travel for him to England) of the data, preparation of the data and creation of a website containing a searchable database and a graphical representation of trials in the dataset.
State of data:
1689 - 1730 limited or no data at present (end of year 3)
1730 - 1810 data needs minor correction and normalization (end of year 1)
1810 - 1837 data needs major work (end of year 2)
"respited larcenies" will likely be included, Simon to consider and decide
Simon will want to add a couple of fields to accommodate links to external sources (images of the original documents etc.)