The fourth plan necessitates rewriting all the code to retrieve T1 and T2, so I'm a good way into that, but more questions arose while doing that, which are still being worked out, so we're not done this week, unfortunately.
This is it, to be implemented ASAP:
1. ensure that custodian sales are found NOT subject to the requirement to be non-nominal; then add a constructed variable: NOMINAL_CUSTODIAN_SALE=1. 2. After finding a T2, search for any qualifying transactions on descendant properties of the T1 which occur before 1942-03-04, and output a constructed variable with a count of these: T1_T2_INTERVENING=5. 3. After finding T1 and a subsequent custodian sale T2, search for NON-NOMINAL sales to JCs on any descendant property of the original property between the T1 date and the custodian sale date. In such a case, abandon construction of the original row and instead construct a new row in which that non-nominal sale is the T1 (retaining the already-found custodian sale as T2). 4. After finding T1 and a subsequent custodian sale T2, and assuming action #3 does not cause the row to be rejected, search for NOMINAL sales to JCs on any descendant property of the original property between the T1 date and the custodian sale date. Output a constructed variable with a count of these (NOMINAL_LATE_SALES_TO_JC=x).
Discussions over the weekend led to a meeting today in which we thrashed out some of the issues, and we're half-way to a new plan; the immediate requirement was to investigate the cases where there is a sale to a JC buyer between the start date of the custodian activity and the actual sale of a property by the custodian. For the record, the quick-and-dirty XQuery to answer this question is below, along with the answers. It's clear these cases are not edge cases we can ignore; there's a definite pattern here.
let $baseDir := 'file:///home/mholmes/WorkData/history/stanger-ross/landscapes_of_injustice/svnrepo/trunk/xml/', $startDate := xs:date('1942-03-04'), $vanDbFile := concat($baseDir, 'landscapes_live_current_lotBased.xml'), $mapDbFile := concat($baseDir, 'landscapes_mapridgelive_current_lotBased.xml'), $dbs := (doc($vanDbFile), doc($mapDbFile)), $custSales := distinct-values(($dbs//title[hasCustodianSeller != '0'][@isNominal='false'][not(matches(effectiveDate, '00'))]/@id)) for $t in $custSales let $title := $dbs/descendant::title[@id = $t][1], $tDate := xs:date($title/effectiveDate), $lotIds := $title/lotsForTitle/lot/@id, $targets := $dbs/descendant::title[not(matches(effectiveDate, '00'))][lotsForTitle/lot/@id = $lotIds][someOwnersJapanese != '0'][xs:date(effectiveDate) lt $tDate and xs:date(effectiveDate) gt $startDate] return if (count($targets) gt 0) then concat('Title ', $t, ' has JC purchases in the key window: ', string-join(distinct-values($targets/@id), ','), ' ') else ()
Title 1034 has JC purchases in the key window: 1111,1159,1160 Title 1701 has JC purchases in the key window: 1805,1807 Title 2926 has JC purchases in the key window: 2963 Title 5323 has JC purchases in the key window: 5379,5423 Title 5460 has JC purchases in the key window: 5502 Title 3799 has JC purchases in the key window: 1172 Title 4113 has JC purchases in the key window: 4192 Title 4640 has JC purchases in the key window: 4708 Title 5068 has JC purchases in the key window: 5122,5123 Title 6453 has JC purchases in the key window: 6456 Title 6605 has JC purchases in the key window: 6607 Title 6649 has JC purchases in the key window: 6661 Title 29141 has JC purchases in the key window: 29144 Title 29329 has JC purchases in the key window: 29278
Per plan worked out in email and on the phone with JSR, itemized as plan_3.txt, made a range of changes to the process of generating the spreadsheet. Some changes still being figured out.
In emails with JSR, worked out the plan below, and implemented all the code parts; I'm now halfway through the data dictionary, which is proving helpful in clarifying our procedures for me as well as a potential reader.
Made a number of updates to generation of XML, GeoJSON and spreadsheet information to make for clearer output, and to build 2018 values into the whole process so any new products are consistent. Maps are now easier to read, and all include 2018 dollars. Lots more to be done, though.
Today was long and complicated but it seems to be working. I'm now producing JSON files for each row in the spreadsheet table, and I have an HTML representation of the spreadsheet which provides links to the maps from those story files. That puts us in a position to examine lots of real cases and see whether there are any flaws in our retrieval and processing, as well as to get a sense of the actual stories we're uncovering.
As far as I can tell, the spreadsheet is now doing what it's supposed to, after consultation with JSR and reworking. We will need only one spreadsheet rather than two. I'm now working on generating GeoJSON file for each row, so we can look at what we're generating and do human sanity checks.
It lacks comparison title(s) pending decisions about how those should be selected, but it works! Took all day to get there. Lots of outstanding questions, but I think we're on the way.
Did a lot of work today on creating and testing output for specific lot/title combinations, as cell sequences, and rendering them into TSV. I've also re-worked the lot-based view a little to add precomputed values of all kinds that I need for the spreadsheet, and completed more functions that retrieve properties and titles according to various parameters. It's getting to a point where I can see what now needs to be done for the first target output spreadsheets, and I've sent a couple of questions to JSR regarding precise details of what's required, where the plan was a little vague.