There's no actual data in the check db to work with yet, so today I've just written some XSLT to create title-based views of the data suitable for comparison, and build those forms into the build process so they're easy to work with. When I have some real data next week I'll be able to do the comparisons.
This week I'm tasked with writing an automated comparison of the data in the Maple Ridge check database against the live db. The first stage in this was adding the Check db to the processing which:
- dumps it nightly in XML through a cron job on Grape (see previous posts)
- processes it in the same way as the other two dbs to create a more usable view, then a lot-based view
- adds the results to the downloadable products
I'll now start working on the comparison code.
JSR requested this on Friday. Now done, with three back-end tables populated (ethnicities, institution types and locations), and with the colour and heading changed to clearly distinguish it from the original db to prevent confusion. I have his permission to delete the landscapes_backup and landscapes_copy versions of the Powell St db, which I'll do when I get the chance.
JSR is now looking at the spreadsheet I created a couple of weeks ago attempting to integrate GIS data with our lot-based info. Many questions arise out of this, and we've had a long email discussion; I've done some programmatic investigation of some difficult issues including block/lot identifier collisions, and we're coming down to a set of instructions for moving forward, which I'll try to codify when the discussions are complete, and execute next week.
Tested on dev, then deployed and tested on live. These are the details:
/* Changes to the db to add support for Rights to Purchase. */ /* Additions to the Titles table. */ ALTER TABLE `titles` ADD COLUMN `ttl_rp_amount` int(11) default NULL AFTER `ttl_marketvalue`; ALTER TABLE `titles` ADD COLUMN `ttl_rp_interest` DECIMAL(5,2) default NULL AFTER `ttl_rp_amount`; ALTER TABLE `titles` ADD COLUMN `ttl_rp_outstanding` int(11) default NULL AFTER `ttl_rp_interest`; /* Creation of a new linking table for RP holders. */ CREATE TABLE IF NOT EXISTS `rpholders_to_titles` ( `rph_rph_id` int(11) NOT NULL auto_increment, `rph_owner_id_fk` int(11) NOT NULL, `rph_title_id_fk` int(11) NOT NULL, PRIMARY KEY (`rph_rph_id`), KEY `rph_owner_id_fk` (`rph_owner_id_fk`), KEY `rph_title_id_fk` (`rph_title_id_fk`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ; ALTER TABLE `rpholders_to_titles` ADD CONSTRAINT `rpholders_to_titles_ibfk_1` FOREIGN KEY (`rph_owner_id_fk`) REFERENCES `owners` (`own_owner_id`) ON DELETE CASCADE ON UPDATE CASCADE, ADD CONSTRAINT `rph_to_titles_ibfk_2` FOREIGN KEY (`rph_title_id_fk`) REFERENCES `titles` (`ttl_title_id`) ON DELETE CASCADE ON UPDATE CASCADE; /* Additions to the Titles table in local_classes.php. */ $this->addField(new MdhOneToManyField('ttl_rpholders', 'RP Holders', 'ttl_title_id', 'owners', 'own_owner_id', 'own_desc', 'rpholders_to_titles', 'rph_rph_id', 'rph_title_id_fk', 'rph_owner_id_fk', true, 'own_desc', true)); $this->addField(new MdhIntField('ttl_rp_amount', 'RP Amount', '', true)); $this->addField(new MdhDecimalField('ttl_rp_interest', 'RP Interest Rate', '', 5, 2, true)); $this->addField(new MdhIntField('ttl_rp_outstanding', 'RP Outstanding Balance', '', true));
Implemented the bundle code feature, but didn't want to make major changes while the team is working; wrote the SQL and PHP to do it, and tomorrow morning first thing I'll run them on dev and if they work, port the changes to live.
I will go ahead with:
Bundles: Add a Bundle Code (text) field to the Titles database, so that if necessary any title can be linked with its bundle as a digital image. RE: Rights to Purchase: Add these fields to the Titles table: RP Holders (one-to-many field linking to the Owners table) RP Amount (integer field for dollars) RP Interest Rate (decimal field for percentages) RP Still Owing (integer field for dollars) The RP Code, if present, would be mentioned in the Notes field.
Some email discussion and a Google Hangout to figure out what to do about RPs and the Bundle documents; we have evolved a proposal which requires some changes to the database, but I'm waiting for responses from the others before implementing.
Posted report on progress to end of May at
https://basecamp.com/2690203/projects/6501222/messages/58643365
We continue to work on the 1949 person index for Vancouver. There are 604 pages in the person directory and we have gone through 240 of them so far consuming about 15 person-days. This is going very slowly as the documents are hard to read. I notice that in my last posting I reversed the references to the 1949 person and 1949 street indexes. I.e. I said we'd completed the person index and had begun on the street index when in fact we had completed the street index and had begun on the person index.
In collaboration with Jordan's group, we have assembled a spreadsheet for about 300 letters of protest collected by the Custodian and identified the presence in each of about 16 types of claim (e.g. lack of consent, violation of rights, explicit mention of fishing assets), author names and other details.
We have also OCR'd, transcribed and proofed 3 of the 4 lists of unsold properties held by the custodian. The first list has about 480 records. The second and third are sublists of the first (i.e. no new properties have been added), so rather than transcribe all that again, we're adding "appears on list 2" and "appears on list 3" columns to the first list and putting a 1 or 0 in those as appropriate for each record.
The fourth list of about 100 records consists of some properties on the first list and some new lists, and the OCR isn't helpful, so we'll manually transcribe the new records and put a "1" in the "appears on list 4" column for the records that appear on lists 1 through 3. It looks like we may have to add some columns to the tables to accommodate the structure of the entries in the fourth list, which are far less consistent than the first list.
Next week :
- we will begin analyzing some of the protest letters and some of the Campbell fonds letters to figure out what modifications we have to make to the schema file (e.g. elements for the opening and closing material of letters), and markup a sample of documents to test the modifications and estimate time needed to transcribe 1 to 2 page letters.
- continue work on the 1949 person directory.
- we still have all the directory files listed last time in the job hopper.
- talk with Martin about adding ethnicity attribution to names in our XML files.
Posted to basecamp progress over first few weeks:
https://basecamp.com/2690203/projects/6501222/messages/58129530
We've completed:
- transcribe 1941 BC and Yukon Directory for Steveston into XML (~800 records / 1 person-week incl training) and add to directories/data section of svn repository
- add legal lot information to Mizuta spreadsheet of street addresses in Steveston (1 person-week) and add to directories/data section of svn repository
- transcribe 1941 BC and Yukon Directory for Haney into XML (~800 records / 1 person-week incl training) and add to directories/data section of svn repository
- survey ~480 protest letters to identify non-economic claims, names of complainants and group docs that form correspondences (1/2 person-week) and submit report to Jordan
- transcribe 1949 BC and Yukon Directory Person listing for Vancouver into XML (~900 records / 2 person-week) and add to directories/data section of svn repository
- scan map pages from Mizuta book on Steveston, Yamaga book on Haney, Hashizume book on mission (latter to help in deciding if Fraser Valley area should be extended, and then possible source for names) and post on Zotero for internal use by other clusters, and add to directories/data section of svn repository
Our method has been for two transcribers to each work on half of same source document simultaneously, discussing problematic cases etc. to ensure consistent treatment. On completion of a source, each transcriber proofs a 10% sample of the other transcriber's work and we jointly review results.
Currently working on:
- transcribe 1949 BC and Yukon Directory Street listing for Vancouver into XML (~2700 records estimate 6 person-weeks)
Waiting in job-hopper:
- transcribe 1930 Haney (~800 records / 1 person-week)
- transcribe 1930 Steveston (~800 records / 1 person-week)
- transcribe 1949 Haney (~800 records / 1 person-week)
- transcribe 1949 Steveston (~800 records / 1 person-week)
- transcribe 1943 Haney (~800 records / 1 person-week)
- transcribe 1943 Steveston (~800 records / 1 person-week)
- transcribe 1942 Haney (~800 records / 1 person-week)
- transcribe 1942 Steveston (~800 records / 1 person-week)
Possible full-text transcriptions:
English documents selected from material accessed/imaged as items by Nikkei people for
- Kagetsu fond
- Maikawa fond
- Kimura fond
- Various (Araki, Nishihata, Jusuke Ishikawa, Saito)
- Campbell et al fond
Selections from
- Custodian's collection of protest letters