normalize streets and directory points in census_directory_matches
I've normalized the data in the census_directory_matches.xls file, so that the address street name and street number appear in separate fields, and so the pointer to the records in the business_directory table are correct.
Addresses:
I've split the addresses from one field into three fields:
AddNum
AddStreet
AddNotes
Some of the numbers contain the 1/2 character. This may matter later as the inclusion of that character means the numbers can't be treated as integers (obviously as they're not), and I'm not sure if they can be treated as decimals either (depends if the software is smart enough to figure out that "1/2" is the same as "0.5". We can certainly always treat those numbers as strings of characters (i.e. do searches on them). We'll cross that bridge when we come to it.
I made no effort to normalize street names, but let PD know that there are apparent inconsistencies in the original data, if that matters.
Record Id's
The copy of the database PD derived the census_directory_matches file from was out of date, so the xls file had values in the "recordId" field which pointed to non-existent records in the 1892 business_directory table.
I created a new field called curDirId in the census_directory_matches file. I copied the values from the recordId field into the curDirId field and then did the following:
For records with old recordId 2585+ : adjust curDirId by +2327
For records with old recordId 2662+ : adjust curDirId by -1
For records with old recordId 4003+ : adjust curDirId by -1
For records with old recordId 6504+ : adjust curDirId by -1
For records with old recordId 8783+ : adjust curDirId by -1
For records with old recordId 12180+ : adjust curDirId by -1
The first step started the numbering at the correct place. I then discovered that the current database has five fewer records than the version PD must have been using, so I had to go through and figure out where to adjust the numbers again to correct for that.