Got new version of 1790s data files from Simon. Spent good part of this week rewriting the regular expressions and rationalizing that process. Added more steps near beginning involving manually editing small numbers (less than 20 of 750) of records that had various kinds of syntactic irregularities (largely inconsistent white-spaces) so that subsequent regular expressions would be easier. Also wrote a relax ng schema for each step in the 7 stage process of generating the xml data files so that I could test for syntactic irregularities even if only crudely.
This entry was posted by and is filed under Activity Log.