XML file sent to SD for proofing
I've sent the 1800 decade XML data file to SD for an initial proofing. Stewart did the bulk of the raw-to-XML transformation using a series of GREP statements. For this final transformation, I used XSLT to convert raw text to 'normalized' values that match against the RNG validation file. Here's how the validation currently breaks down:
- The XML file has 41535 lines
- 116 of these lines (0.003%) do NOT validate
- Of these errors, nearly all of them (100 / 116) are either empty jury_type elements or empty crime_text elements
- The rest of the errors are mainly due to crimes or outcomes that didn't fit into any category (at least, to my untrained eye)
The remaining errors will need a 'trained' eye to be fixed on a case-by-case basis. Once SD is happy with the data then he'll return it and I'll perform a further transformation that will import the data into the MySQL database (for which the XSLT spreadsheets have, luckily, already been written).
I've also been spending some time this morning ensuring that the XSL files used in the transformation are well-documented.