Conversions done
Posted by mholmes on 17 Feb 2010 in Activity log
I finished my XSLT conversion for fixing the encoding of glottalization, and ran it on the files awaiting work. They're now all sitting in a directory on the server called "ready_to_edit". There are only a few oddities/problems which make some of the files invalid:
- Some entries have no xml:id at all, for some reason. Rather than make one up, I'll leave it to the editor to assign one.
- Some entries have xml:ids that begin with the standalone grave accent (`, U+0060), which is not a valid character at the beginning of an xml:id. This is because that character is at the beginning of the entry itself. We need to look at these, and decide a) if it should be there, or it's some kind of processing error; and b) if it's correct, then how we should handle it when creating xml:ids (possibly by just deleting it?).
- A couple of entries have completely borked content because the original DOS stuff was never converted over, for some reason. Those entries are few and small enough to be dealt with on a case-by-case basis; the English glosses are there, so they can be tracked back to their original data.