Pace Mr. Jenkins, I have the downloadable package being created to support the chapter JSR is writing, and I've finished the data dictionary. I think this bit of the project should now be done.
Yesterday and today: JSR asked for more changes/additions to the original spreadsheet, and also for a second spreadsheet, organized by title. I've implemented the changes to the first spreadsheet, which is now renamed, and created the second. I found some anomalies in the data in the process, and fixed a couple of bad dates, while reporting some more suspicious numbers from the db. Also implemented an HTML version of the new spreadsheet linking to title maps, which will be handy.
This spreadsheet, which I started at the weekend, contains brief analyses of the numbers of properties which end up being included (directly or indirectly) in the main spreadsheet, compared with the total number of properties in the database, for each study area location.
At the same time, I diagnosed and fixed a bug in the generation of GeoJSON for maps, which would place two copies of the same property inside a single title category on the map.
Flags for nominal custodian sales and for sales intervening between T1 and T2. New plan for additional spreadsheet showing status of properties included and not included in the main spreadsheet, to be done next week.
I've got a working process that detects and removes actual duplicate rows from the table. But there's still the question of why we have multiple rows for the same starting property. I'm finding it impossible to figure that out so far.
Implementing the revisions from the latest discussions, I'm encountering situations in which the same property id has two or three rows. Only a few, but we can't figure out why they're happening. More to do, still in discussions with JSR.
The fourth plan necessitates rewriting all the code to retrieve T1 and T2, so I'm a good way into that, but more questions arose while doing that, which are still being worked out, so we're not done this week, unfortunately.
This is it, to be implemented ASAP:
1. ensure that custodian sales are found NOT subject to the requirement to be non-nominal; then add a constructed variable: NOMINAL_CUSTODIAN_SALE=1. 2. After finding a T2, search for any qualifying transactions on descendant properties of the T1 which occur before 1942-03-04, and output a constructed variable with a count of these: T1_T2_INTERVENING=5. 3. After finding T1 and a subsequent custodian sale T2, search for NON-NOMINAL sales to JCs on any descendant property of the original property between the T1 date and the custodian sale date. In such a case, abandon construction of the original row and instead construct a new row in which that non-nominal sale is the T1 (retaining the already-found custodian sale as T2). 4. After finding T1 and a subsequent custodian sale T2, and assuming action #3 does not cause the row to be rejected, search for NOMINAL sales to JCs on any descendant property of the original property between the T1 date and the custodian sale date. Output a constructed variable with a count of these (NOMINAL_LATE_SALES_TO_JC=x).
Discussions over the weekend led to a meeting today in which we thrashed out some of the issues, and we're half-way to a new plan; the immediate requirement was to investigate the cases where there is a sale to a JC buyer between the start date of the custodian activity and the actual sale of a property by the custodian. For the record, the quick-and-dirty XQuery to answer this question is below, along with the answers. It's clear these cases are not edge cases we can ignore; there's a definite pattern here.
let $baseDir := 'file:///home/mholmes/WorkData/history/stanger-ross/landscapes_of_injustice/svnrepo/trunk/xml/', $startDate := xs:date('1942-03-04'), $vanDbFile := concat($baseDir, 'landscapes_live_current_lotBased.xml'), $mapDbFile := concat($baseDir, 'landscapes_mapridgelive_current_lotBased.xml'), $dbs := (doc($vanDbFile), doc($mapDbFile)), $custSales := distinct-values(($dbs//title[hasCustodianSeller != '0'][@isNominal='false'][not(matches(effectiveDate, '00'))]/@id)) for $t in $custSales let $title := $dbs/descendant::title[@id = $t], $tDate := xs:date($title/effectiveDate), $lotIds := $title/lotsForTitle/lot/@id, $targets := $dbs/descendant::title[not(matches(effectiveDate, '00'))][lotsForTitle/lot/@id = $lotIds][someOwnersJapanese != '0'][xs:date(effectiveDate) lt $tDate and xs:date(effectiveDate) gt $startDate] return if (count($targets) gt 0) then concat('Title ', $t, ' has JC purchases in the key window: ', string-join(distinct-values($targets/@id), ','), ' ') else ()
Title 1034 has JC purchases in the key window: 1111,1159,1160 Title 1701 has JC purchases in the key window: 1805,1807 Title 2926 has JC purchases in the key window: 2963 Title 5323 has JC purchases in the key window: 5379,5423 Title 5460 has JC purchases in the key window: 5502 Title 3799 has JC purchases in the key window: 1172 Title 4113 has JC purchases in the key window: 4192 Title 4640 has JC purchases in the key window: 4708 Title 5068 has JC purchases in the key window: 5122,5123 Title 6453 has JC purchases in the key window: 6456 Title 6605 has JC purchases in the key window: 6607 Title 6649 has JC purchases in the key window: 6661 Title 29141 has JC purchases in the key window: 29144 Title 29329 has JC purchases in the key window: 29278
Per plan worked out in email and on the phone with JSR, itemized as plan_3.txt, made a range of changes to the process of generating the spreadsheet. Some changes still being figured out.