SD sent me an spreadsheet with the 1820 to 1824 data in it. Found a little "how-to" file which explained the steps to turn that into an xml data file with schema for validating. Did the process and noticed that two of the fields had values swapped. Checked the xslt and sure enough
found this:
<crime_normalized><xsl:value-of select="Crime_Group"/></crime_normalized>
<crime_group><xsl:value-of select="Crime_Normalized"/></crime_group>
which I corrected to this:
<crime_normalized><xsl:value-of select="Crime_Normalized"/></crime_normalized>
<crime_group><xsl:value-of select="Crime_Group"/></crime_group>
Also noticed that the import changed all integer values to floating point, (e.g. 16 became 16.0), and only integers are valid in the various field (age, weeks, months, years etc.) Just did a grep search and replace to fix those.
Huge majority of 100+ remaining invalid instances are mercy appeals where Simon has entered something like jury/prosecutor and the XML requires a separate mercy appeal for each proponent.
XML file now with SD to make remaining corrections, then return to me, at which point I'll follow the rest of the how-to procedure to render back to relational data and upload to db.
Bailey dev frontend is in the lang02 account with an obvious folder name. As of this writing it is identical to the production frontend (in same account) and backup (on my Mac).
Bailey dev backend is on HCMC's db server (production is on a university php server, database's name is baileydev. As of this writing, it is identical to the production db. Backup of that db (an SQL file) is on my Mac).
In testing my modifications to the importer code, I created records which I had to delete. The importer typically adds records to these tables:
criminals
outcomes
respites
trials
crimes
trial_files
and may add records to these tables:
aliases
judge_respites
outcome_durations
execution_specials
mercy_appeals
When deleting spurious test records, to avoid foreign-key violation errors, I started with these tables:
crimes
trial_files
criminals
Figured out the problem with the code in the Importer class, which Jamie confirmed. He had blogged the point of the code here. I missed that posting, but still would have need confirmation.
The id attribute in the criminal element can be given a non-Null value (strictly a varchar(31)) by SD when he knows both trials refer to the same individual. Both criminal elements must have identical values for the id attribute. The syncCriminals method in the Importer uses that information to remove the duplicate entry in the criminals table and reconciles criminal_id_fk's in other tables.
The problem was that the tables did not have a field to hold that information. I created a duplicate_label field of type varchar(31), and modified the Importer code to refer to that field.
I discovered a similar (though simpler) problem with the trials field. The importer code expects a case_reference field in that table, but there wasn't one, so I created one of type varchar(7). Jamie's instructions to SD are to use that when trials are in the same court session (), but in different trial_files, and he wants them associated (e.g. for a bunch of criminals tried for the same crime).
I don't see the code that handles that similar to the syncCriminal code, but the upload works.
In the _importCriminal method, this block of code:
if (isset($data->attributes()->id)) {
$xmlId = $data->attributes()->id;
} else {
$xmlId = null;
}
always returns null for $xmlId, as there are no attributes in any
instance of the criminal element in the data.
In the criminals table, there is no field named "xml_id", so any
reference to that field will throw an error.
There is one such reference in _importCriminal:
$new = array(
'surname' => $data->surname,
'given_name' => $data->given_names,
'age' => $data->age,
'gender_id_fk' => array_search(strtolower($data->gender),
$this->genderMap),
'xml_id' => $xmlId
);
$id = $this->tables['criminals']->insert($new);
and two references in one line in _syncCriminals:
$records =
Zend_Db_Table::getDefaultAdapter()->fetchPairs('SELECT criminal_id,
xml_id FROM criminals WHERE xml_id IS NOT NULL');
For the instance in _importCriminal, if I replace 'xml_id' with
'criminal_id' then the record is added to the criminals table, but of
course there is no field 'xml_id' so naturally no value for that field,
so I'm not sure if that's the desired behaviour.
For the instance in _syncCriminals, it makes no sense to simply replace
xml_id with criminal_id. I understand the logic of what syncCriminals is
supposed to do, but I don't see how that code can possibly not throw an
error. That line of code requires there be an xml_id field and a
criminal_id field in each record, but that's not the case.
Is there supposed to be an xml_id field in the criminals table? I can't
find any instance of that table which has such a field. Even if I did,
as far as I can figure, the value in that field would be NULL in every
instance. I have found no instances in the data file of a criminal element with an id attribute.
Wrote to Jamie again to see if any of this jogs his memory about the structure of the dev instance he was using.
When I try to import the data file into the database using the importer.php class Jamie created, I get an error:
Fatal error: Uncaught exception 'PDOException' with message
'SQLSTATE[42S22]: Column not found: 1054 Unknown column 'xml_id' in
'field list'' in
/home3/14/lang02/www/bailey_v2/includes/classes/Zend/Db/Statement/Pdo.php:228
The table 'criminals' in the database (and thus the derived Zend_Db_Table object 'criminals') has these five fields:
criminal_id
surname
given_name
gender_id_fk
age
The importCriminal method in the real file and the testImportXml method in my test file create a variable $new which has 5 fields in it:
surname
given_name
gender_id_fk
age
xml_id
When that array is inserted as a new record in the 'criminals' table, an error is generated. My guess is that's because the criminal_id field is automatically populated so we're trying to put 5 fields of data into a table with 4 fields. There is no field in the table for the "xml_id" value in the array.
The new data file validates, and if I import one of the existing data files I get the same error, so I don't think the data is the problem.
If I comment out the insert statement, no errors in my test method.
If I get rid of the xml_id field in the $new array, no errors in my test method.
Given that the value for the xml_id field is always null (as far as I can figure out as the element 'criminal' in the data has no attributes and specifically not an id attribute), I don't understand why it's being included in the insert at all. I'm guessing that the xml_id is used internally for the duration of processing only. It looks like its intended to be used in the method that resolves duplicate criminals.
public function importXml($file) {
$contents = file_get_contents($file);
$xml = new SimpleXMLElement($contents);
foreach ($xml->trial_file as $i => $trialFile) {
$criminalId = $this->_importCriminal($trialFile->criminal);
.
.
.
}
.
.
.
return true;
}
protected function _importCriminal($data) {
if (isset($data->attributes()->id)) {
$xmlId = $data->attributes()->id;
} else {
$xmlId = null;
}
$new = array(
'surname' => $data->surname,
'given_name' => $data->given_names,
'age' => $data->age,
'gender_id_fk' => array_search(strtolower($data->gender), $this->genderMap),
'xml_id' => $xmlId
);
$id = $this->tables['criminals']->insert($new); /* error thrown here */
$this->_importCriminalAliases($data->aliases->alias, $id);
return $id;
}
I had to find all instances of criminals in which both the given-name and surname fields are identical and the number of such duplicates
Here's the SQL I used for the example:
SELECT `surname`, `given_name`, COUNT(`surname`) AS Instances FROM criminals GROUP BY `surname`, `given_name` HAVING ( COUNT(`surname`) > 1 ) AND ( COUNT(`given_name`) > 1)
and here's the same SQL with abstracted table and field names:
SELECT `field1Name`, `field2Name`, COUNT(`field1Name`) AS Instances FROM tableName GROUP BY `field1Name`, `field2Name` HAVING ( COUNT(`field1Name`) > 1 ) AND ( COUNT(`field2Name`) > 1)
When I put a value into the given-names textfield in the search box and then tried to execute a search I got back an error message which began with the following:
Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42S22]: Column not found: 1054 Unknown column 'c.given' in 'where clause''
It took a couple of hours wading through the form, and the json and zend code working with the form and the data model until I found this line of code in includes/classes/Search.php
protected function _prepareCriminals() {
$this->_select->joinLeft(
array('c' => 'criminals'),
'c.criminal_id = tf.criminal_id_fk',
array('c.criminal_id', 'c.surname', 'c.given_name', 'c.age')
);
...
if ($this->inData('given-name')) {
$this->_select->where('LOWER(c.given-name) LIKE ?', '%' . strtolower($this->_data['given-name']) . '%');
}
I changed one "given-name" to a "given_name" (see below) and the queries now seem to work:
protected function _prepareCriminals() {
$this->_select->joinLeft(
array('c' => 'criminals'),
'c.criminal_id = tf.criminal_id_fk',
array('c.criminal_id', 'c.surname', 'c.given_name', 'c.age')
);
if ($this->inData('given-name')) {
$this->_select->where('LOWER(c.given_name) LIKE ?', '%' . strtolower($this->_data['given-name']) . '%');
}
so, apparently because given-name was not a member of the c array, the instance of LOWER(c.given-name) was interpreted as LOWER(c.given)-name. name of course was null so presumably had no effect, resulting in c.given being the name of the field in the table to search, which of course doesn't exist generating the error I saw.
Previously, if there were more than 20 years in the range of returned data, the years appearing along the x-axis in the flot graph overwrote each other.
I changed the init-flot.js file by replacing this:
xaxis: {
tickSize: 1,
tickFormatter: function(val, axis) {
if (!yearTotals[val]) {
return val;
} else {
return val + '<br />' + yearTotals[val];
}
}
}
with this:
xaxis: {
ticks: 20,
tickDecimals: 0,
tickFormatter: function(val, axis) {
return ' ' + val;
}
}
The ticks limits the number of years explicitly labelled to a maximum of 20, thus taking care of the overlapping year display on the x-axis when there are lots of years in the chart.
The tickDecimals controls whether fractions of years get ticks on the x-axis. For small year ranges, if this is not set to 0, you get values like 1744.5 appearing, which is nonsensical for this dataset.
Since not all years get a tick on the x-axis, it didn't seem to make sense to display the total hits, so I've removed that from the tickFormatter.
I prefer the year label to be left justified to the left side of the column in the graph, rather than centered on the left side of the column, so I added those spaces to push the date to the left.
:: Next Page >>
Simon Devereaux has approximately 10,000 records of people convicted in potentially capital cases between 1710 and 1840 in London heard at the Old Bailey court. This project will create a web-based database which will allow interested researchers and members of the public to compose queries on that data (e.g. women charged with robbery 1710-1720). It must be able to support a range of queries and produce output allowing researchers to identify trends in judicial practice over that time.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | > >> | |||||
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 | |