Progress on integrating datasets
Worked on integrating some of the GIS data into the directories work, by using the XML exports from the ArcGIS data to tie the block and lot information to the addresses we have from the directories. This works pretty well, and can be massaged to work a bit better if we accept that fractional house numbers should be lumped in with their integer components for the purposes of block/lot identification (not sure whether that's always true or not).
Following that, I spent some time trying to figure out how we can get the binary data in the <Shape>
element into some usable format. I can't find anything outside of Arc, and I can't get our Arc license to let me launch the product. The data as it is can't be used. I think we need to bring in an Arc person for a day just to export everything that was done in a format that's usable outside of it. Most of the data is quite accessible in the XML, and other stuff can be read in MDB databases and exported from them in Access, but this crucial <Shape>
element is a complete roadblock.
JC is going to investigate this problem and report back. Meanwhile, I've enhanced the Japanese name detection by adding an exception list, which works pretty well for broad-overview purposes.