Received the 1770-79 XLS spreadsheet from SD. Minor change to the format as he's added "Outcome Durn - Yrs" and "Outcome Durn - Other" to handle outcome duration values (see previous blog post: http://hcmc.uvic.ca/blogs/index.php?blog=36&p=8333&more=1&c=1&tb=1&pb=1 ).
Received the proofed 1780-84 data file from SD. After adding "HighTreason" as a crime_normalized, the file has 100% validation.
As of today, these are the data sets that are complete (i.e. use the latest schema and are 100% valid):
- 1780-84
- 1785-89
- 1790-99
- 1800-09
Another schema change for the data: we're going to utilize the hitherto ignored outcome_duration element for outcomes. Until this point, normalized outcomes (i.e. the outcome_normalized element) with years attached, such as "transported for 1 year", "hulks for five years", etc. were simply listed as T1, Hulks5, etc. But, because the list of T# and Hulks# normalized outcomes is growing, SD and I have decided to move the numbers from the outcome_normalized element into the outcome_duration element where they belong.
Furthermore, to allow for flexibility when entering the duration, the outcome_duration element has been expanded with a number of child elements. These could also be attributes, but I've chosen to use elements since everything else in the schema uses elements rather than attributes. The modified element looks like this:
<outcome_duration>
<years>(number of years, or blank for none)</years>
<months>(number of months, or blank for none)</months>
<weeks>(number of weeks, or blank for none)</weeks>
<days>(number of days, or blank for none)</days>
<other>(Life, Remainder, or blank)</other>
</outcome_duration>
I also changed the acceptable values of outcome_normalized so that, instead of accepting T#, Hulks#, and GB#, "Transport", "Hulks", and "GoodBehavior" are the new accepted values. Also, "SelfTL" and "SelfTR" have become "SelfTransport" with either "Life" or "Remainder" in the outcome_duration/other.
The values of years, months, weeks, and days must be an integer (or blank), while the value of other must correspond to a value in the setofNormalizedDurationOthers list in the RNG schema.
Spent quite a bit of time figuring out how to add x-axis totals to each axis on the Flot chart - Flot calls these "ticks" - to display the total number of items for each year (along with the year number, which was already being displayed). Since I'm using the Flot stacks plugin to display multiple sets of data on each axis, it was a bit of a complex process.
The most time consuming part of the process was finding a "Flot-y" way to do it. The API documentation is OK, but a bit too informal for my tastes and glosses over a lot of things. After a lot of testing, debugging, and stepping through functions via the Chrome developer suite, I couldn't find anything suitable that could be achieved by native Flot functions dealing with the ticks. There are plenty of ways to format the data already in the tick (i.e. the year number in this case), but not to add data to the tick. So, I declared a global variable var yearTotals = [], and, in the success function that's triggered after the AJAX call that grabs the JSON data for the chart, I added a for loop to get the totals for each year and add them to the array:
function onDataReceived(series) {
$.each(series, function(index, value) {
var entry = new Array();
entry['label'] = index;
entry['data'] = value.data;
chartData.push(entry);
for (i = 0; i < value.data.length; i++) {
row = value.data[i];
if (!yearTotals[row[0]]) {
yearTotals[row[0]] = row[1];
} else {
yearTotals[row[0]] += row[1];
}
}
});
plotWithOptions(chartData);
}
Now that SD is putting his data into Excel spreadsheets, the import procedure is (thankfully!) a lot simpler than the first process:
Step 1:
- Use Oxygen to convert Excel spreadsheet to XML (File -> Import -> MS Excel File); ensure that "First row contains field names" is checked
- Save as step_1.xml
Step 2:
- Transform step_1.xml with transformations/1_to_2.xsl
- Save as step_2.xml
Step 3:
- Transform step_2.xml with transformations/2_to_3.xsl
- Add RNG schema line to the top of the file:
<?oxygen RNGSchema="http://web.uvic.ca/~lang02/bailey/schema/bailey_trialfile_proofing.rng"type="xml"?> - Validate and correct any errors. Given the variable nature of the data, 100% validation probably won't be possible at this point
- Save as step_3.xml
- Send the file to SD for proofing/final validating
Judges:
Adair-J
Gould-H
Hotham-N
Nares-G
Willes-J
Willes-E
Normalized outcome:
T3
Respite delays:
ThreeDays
TenDays