More work on normalization
Met with CC and examined some of the outcomes from our rulesets. There's obviously a huge amount of tuning still to do, but it's also clear that before each rule is run, the word needs to be checked against the dictionary in case it's already OK; if it is, then we don't need to keep working on it. I've now implemented that by turning the spell-check dictionary into an XML file which is then indexed with xsl:key (I tried other string-finding methods but they were much slower). The transformation now takes substantially longer than it used to, but it's clearer what's happening. One issue might be archaic forms in the spell-check dictionary, of course.
Another issue is u/v variation. When we change one to the other, we often end up changing it back in a later rule. It seems likely that a better approach would be to change all u/v to another unused symbol, and then write rules based on context for changing that symbol to the appropriate output.