As my first project at HCMC, I've been tasked with finishing Stewart's procedure for converting the raw data (as supplied by SD) to well-formed XML. Stewart did an admirable job changing a tab-delimited document into an almost-finished XML file using GREP and an exhaustive series of regular expressions. Since, however, the final steps involve populating elements based on the value of other elements (checked against a list of allowable values), I've decided - after talking with Martin and Greg - to use XSLT to finish off the procedure. I don't know XSLT, but since it will be an important part of my work here for the next eight months, this is a good opportunity to learn the language.
So the first step is to take the value of crime_text, figure out to which value it corresponds in the list of allowable crimes in the proofing RNG, and then use that value to populate crime_normalized. Then, crime_group will be populated based on the value of crime_normalized. respite_text and outcome_text will be massaged in a similar way, though respite doesn't have a respite_group (only _normalized).
Spent a good chunk of time learning my way around what's been done for the Bailey project, the XSLT language, and the specific data and code needs of this project. The consensus: writing a catch-all function in the XSL stylesheet to convert the *_text values to *_normalized values will be tough, so there will likely be the odd value that will need to be edited manually.