The FODS tables to TEI tables conversion is now working; a quick spot check seems that the results are correct, but we'll need to run some diagnostics on it both to ensure that the conversion worked correctly and to identify any oddities in the source data. Added a bunch of documentation, too.
Spent some time evaluating the spreadsheets and thinking through the best process for turning them into RADish. Got a basic build set up with a conversion process. So far, the process looks like so:
- Copy the files into a temporary directory and, in doing so, clean up the filenames (no brackets, no spaces)
- Take those files and convert them to FODS using soffice (there's lots there so it takes a while)
- Then, take those FODS and convert them into a TEI table; I'm still working on this bit, but, so far, I have a process whereby the XSLT gets all the fods in a particular directory (using collection()) and then combines them into TEI document with multiple tables. This might not be the right approach in the long run, but I think it makes the most sense for now, particularly if the various spreadsheets in a collection need to be reconciled
Coded the third new spreadsheet, which wasn't as straightforward as the others. Updated documentation and data dictionary. The final spreadsheet has been dropped from the plan because it doesn't actually make sense.
After some debugging, I've got the last-custodian and no-custodian-t3 versions of the spreadsheets working (I think -- Jenkins is running now), and I've mapped out how the third variant should work. That third variant, as it's currently specified, makes no sense at all to me, though, so I've written to JSR to get some clarification before I put in the considerable amount of work necessary to create it. Meanwhile I can work on the new titles spreadsheet.
Parameterized a couple of functions and implemented the changes required to get #3 on plan 9 done (that's the simplest). Running the test now.
Today was finalizing plan #8 and completing it (more or less); then formulating plan #9:
*********************** *** Eighth plan modification: *** 1. DONE: Test the assumption that Maple Ridge properties were heavily transacted between the T2 in the current spreadsheet and 1949-04-01. 2. DONE: Add a column to the lot spreadsheet which shows the full consideration in 2018 dollars for each title. E.g. T1_CONSIDERATION_2018. 3. DONE: In the lot-based spreadsheet, add a T0, which is the last non-nominal transaction prior to T1 which includes an ancestor property of the root property (for all rows in the spreadsheet). 4. DONE: Add documentation of the VLA spreadsheet to the data dictionary. 5. DONE: Specify end date for T5s in data dictionary. 6. DONE: Fix the sequencing of titles in the BreezeMap output. (CAVEAT: there appears to be a small subset of titles for which this is not working, including 26684. I can't figure out why yet, but it seems to be working fine for the vast majority, so unless it's a priority I'll put it aside for now.) *********************** This is plan #9, for your final approval: *********************** *** Ninth plan modification: *** 1. Rename the original lot spreadsheet property_stats_data_by_lot_2018_orig. 2. Create a post-custodian sales spreadsheet of titles (property_stats_data_by_title_post_custodian_2018) like this: a) Get all custodian sales. b) For each custodian sale title, get the properties. c) For all those properties, get descending properties. d) Get all titles covering that expanded set of properties. e) Filter those titles to exclude any which pre-date the original custodian sale. f) Add the remaining titles to a list. g) For each distinct title in the list, ordered by title id, output a row. QUESTION: Do you care about nominal versus non-nominal in this spreadsheet? 3. Create a new version of the original lot spreadsheet (property_stats_data_by_lot_2018_no_cust_T3) in which T3 is never allowed to be a custodian sale. Call this the no-custodian-T3 rule. 4. Create another new version of the original lot spreadsheet (property_stats_data_by_lot_2018_last_cust) in which we always use the last transaction by the custodian rather than the first (the last-custodian rule). 5. Create a new version of the current lot spreadsheet (property_stats_data_by_lot_2018_strict_descent) where the descendant-property scenario is modified such that at each stage, we only retrieve descendants of properties in the preceding stage, rather than any descendant of the ur-property. QUESTION: does it matter that this will inevitably give rise to chains where the later titles have no land in common with the earlier titles? QUESTION: should this spreadsheet use the no-custodian-T3 rule? QUESTION: should this spreadsheet use the last-custodian rule? *******************
Did some work on the developing schema for RADish.
As instructed by JSR, did a deep dive into the property. Results:
Summary: the lot spreadsheet appears to be constructed (in this test case, at least) according to the rules that we set up. The full process for checking this appears in a short version and a long version below, but first, my take: MY TAKE ON THIS: There are two issues here. First, the Maple Ridge properties start to fragment very quickly indeed, giving rise to a huge number of descendant properties and therefore titles; the chances of getting a simple, clean sequence which aligns with the narrative you're chasing are probably pretty slim. Second, we don't have a rule which says that when we are retrieving T3, it should NOT be a custodian sale IF the T2 is a custodian sale. I think it would make sense to have that rule, to avoid the situation we see in this case where T2, T3 and T4 are all custodian sales. Alternatively, we could adopt a rule which says that when there are multiple custodian sales to choose from at the T2 stage, we always take the latest rather than the earliest. Nevertheless, these measures may not solve your problem; do we need a similar rule that says that if T3 is a VLA sale, then T4 is not allowed to be one? What if T3 is not a VLA sale, but T4 turns out to be one? Another option would be to adopt a rule which says that at every stage, the subsequent property in the sequence must be a descendant property not of the original ur-property but of the property or properties which were on the preceding retrieved title -- so T3 must be on descendant properties of the properties on T2, not on T1, and T4 must cover descendant properties of T3. This is more likely to result it what looks like a coherent sequence, but it is also much more likely to fail to retrieve a successful sequence at all, resulting in a much diminished number of spreadsheet rows, and therefore less statistically convincing results. --------------------------------------------------------------------- DEEP DIVE INTO PROPERTY 21178 Base property: T:12 S:20 Q:NE 1/4 PL:3189 L:1 (p_21178) --------------------------------------------------------- QUERY on overlaps.xml: declare default element namespace "http://hcmc.uvic.ca/ns"; let $baseProp := '21178', $overlaps := //overlap[@prop1=$baseProp or @prop2=$baseProp][xs:float(@percentProp1InProp2) ge 50], $overlapIds := for $o in $overlaps return ($o/@prop1, $o/@prop2), $distinctOverlapIds := distinct-values($overlapIds) return string-join($distinctOverlapIds, ', ') --------------------------------------------------------- Overlapping properties: 21373, 21298, 21297, 21622, 21623, 21616, 21617, 21618, 21621, 21619, 21620, 21574, 21575, 21416, 21682, 21681, 21374, 21614, 21376, 21375, 21615, 21330, 21329, 21255, 21252, 21257, 21256, 21254, 21253 Ids of titles which cover all of these properties: --------------------------------------------------------- XQuery on landscapes_mapridgelive_current_titleBased.xml: let $titles := //title[props/prop[@refid=('21373', '21178', '21298', '21297', '21622', '21623', '21616', '21617', '21618', '21621', '21619', '21620', '21574', '21575', '21416', '21682', '21681', '21374', '21614', '21376', '21375', '21615', '21330', '21329', '21255', '21252', '21257', '21256', '21254', '21253')]], $titleDesc := for $t in $titles order by $t/titleInfo/ttl_date return concat($t/@id, ': ', $t/titleInfo/ttl_title_code, ' (', $t/titleInfo/ttl_date, ')') return string-join((for $t in $titles order by $t/titleInfo/ttl_date return $t/@id), ', ') --------------------------------------------------------- 26684, 26959, 27763, 26688, 26675, 26723, 26732, 26733, 26725, 26730, 26788, 26796, 26791, 26792, 26799, 26807, 26811, 26809, 26812, 26814, 26816, 26762, 26763, 26883, 26927, 26854, 26884, 26885, 26929, 26974, 26984, 26857, 26849, 26987, 26871, 27564, 26945, 26956, 26944, 26955, 26958, 26994, 27581, 27582, 27929, 27014, 27015, 27016, 27571, 27017, 27020, 27654, 26949, 26970, 27930, 27649, 27936, 27021, 27667, 27668, 27712, 27714, 27673, 27729, 27728, 27730, 27731, 27734, 27733, 27677, 27940, 27727, 27759, 27679, 27766, 27710, 27769, 27724, 27708, 27761 Details of these titles: XQuery on landscapes_mapridgelive_current_lotBased.xml: --------------------------------------------------------- let $titleIds := ('26684', '26959', '27763', '26688', '26675', '26723', '26732', '26733', '26725', '26730', '26788', '26796', '26791', '26792', '26799', '26807', '26811', '26809', '26812', '26814', '26816', '26762', '26763', '26883', '26927', '26854', '26884', '26885', '26929', '26974', '26984', '26857', '26849', '26987', '26871', '27564', '26945', '26956', '26944', '26955', '26958', '26994', '27581', '27582', '27929', '27014', '27015', '27016', '27571', '27017', '27020', '27654', '26949', '26970', '27930', '27649', '27936', '27021', '27667', '27668', '27712', '27714', '27673', '27729', '27728', '27730', '27731', '27734', '27733', '27677', '27940', '27727', '27759', '27679', '27766', '27710', '27769', '27724', '27708', '27761'), $titles := for $tId in $titleIds return //titles/title[@id = $tId][not(preceding::titles/title[@id=$tId])], $titleInfos := for $t in $titles order by $t/effectiveDate return concat($t/@id, ': ', $t/code, ' ', $t/effectiveDate, ' ', ' nominal: ', if ($t/@isNominal = 'true') then 'yes ' else 'no ', ' custSale: ', if ($t/hasCustodianSeller = '1') then 'yes ' else 'no ', ' vlaBuy: ', if ($t/hasVlaOwner = '1') then 'yes ' else 'no ', ' vlaSell: ', if ($t/hasVlaSeller = '1') then 'yes ' else 'no ' ) return string-join($titleInfos, ' ') --------------------------------------------------------- 26688: 32738E 1920-05-10 nominal: yes custSale: no vlaBuy: no vlaSell: no 26675: 74159E 1927-10-16 nominal: no custSale: no vlaBuy: no vlaSell: no 26684: 106669E 1934-03-03 nominal: yes custSale: no vlaBuy: no vlaSell: no 26723: 106822E 1934-03-21 nominal: no custSale: no vlaBuy: no vlaSell: no 26732: 107183E 1934-04-24 nominal: no custSale: no vlaBuy: no vlaSell: no 26733: 107254E 1934-05-02 nominal: no custSale: no vlaBuy: no vlaSell: no 26725: 107963E 1934-07-18 nominal: no custSale: no vlaBuy: no vlaSell: no 26730: 109668E 1934-12-06 nominal: yes custSale: no vlaBuy: no vlaSell: no 26788: 113071E 1935-10-29 nominal: yes custSale: no vlaBuy: no vlaSell: no 26796: 116972E 1936-09-18 nominal: no custSale: no vlaBuy: no vlaSell: no 26791: 132983E 1939-09-15 nominal: no custSale: no vlaBuy: no vlaSell: no 26792: 149801E 1942-04-02 nominal: yes custSale: no vlaBuy: no vlaSell: no 26799: 166279E 1943-12-28 nominal: yes custSale: no vlaBuy: no vlaSell: no 26807: 169926E 1944-04-17 nominal: yes custSale: no vlaBuy: no vlaSell: no 26811: 169945E 1944-04-17 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26809: 169933E 1944-04-18 nominal: yes custSale: no vlaBuy: no vlaSell: no 26812: 169952E 1944-04-18 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26814: 170808E 1944-04-25 nominal: yes custSale: no vlaBuy: no vlaSell: no 26816: 170828E 1944-04-25 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26762: 171885E 1944-05-18 nominal: yes custSale: no vlaBuy: no vlaSell: no 26763: 171903E 1944-05-18 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26883: 175591E 1944-09-02 nominal: no custSale: no vlaBuy: no vlaSell: no 26927: 281892E 1951-04-16 nominal: no custSale: no vlaBuy: no vlaSell: no 26854: 291940E 1952-01-11 nominal: no custSale: no vlaBuy: no vlaSell: yes 26884: 295379E 1952-04-16 nominal: no custSale: no vlaBuy: no vlaSell: yes 26885: 295380E 1952-04-16 nominal: no custSale: no vlaBuy: no vlaSell: no 26929: 313897E 1953-08-05 nominal: no custSale: no vlaBuy: yes vlaSell: no 26974: 322796E 1954-02-23 nominal: no custSale: no vlaBuy: no vlaSell: yes 26984: 359159E 1956-01-26 nominal: yes custSale: no vlaBuy: no vlaSell: no 26857: 412345E 1958-07-28 nominal: yes custSale: no vlaBuy: yes vlaSell: no 26849: 423191E 1958-12-05 nominal: yes custSale: no vlaBuy: no vlaSell: yes 26987: 448583E 1959-12-14 nominal: no custSale: no vlaBuy: no vlaSell: no 26871: 450855E 1960-01-07 nominal: yes custSale: no vlaBuy: no vlaSell: yes 27564: 452250E 1960-01-18 nominal: no custSale: no vlaBuy: no vlaSell: no 26945: 467021E 1960-09-26 nominal: no custSale: no vlaBuy: no vlaSell: no 26956: 469702E 1960-11-17 nominal: yes custSale: no vlaBuy: no vlaSell: no 26944: 468183E 1960-11-30 nominal: yes custSale: no vlaBuy: no vlaSell: no 26955: 468184E 1960-11-30 nominal: no custSale: no vlaBuy: no vlaSell: no 26958: 484848E 1961-09-12 nominal: yes custSale: no vlaBuy: no vlaSell: no 26994: 507210E 1962-11-20 nominal: no custSale: no vlaBuy: no vlaSell: no 26959: 533338E 1964-04-09 nominal: no custSale: no vlaBuy: no vlaSell: no 27581: 538458E 1964-07-13 nominal: no custSale: no vlaBuy: no vlaSell: no 27582: 538459E 1964-07-13 nominal: no custSale: no vlaBuy: no vlaSell: no 27929: 608026E 1967-08-14 nominal: yes custSale: no vlaBuy: no vlaSell: no 27014: 627673E 1968-03-13 nominal: yes custSale: no vlaBuy: no vlaSell: no 27015: 627674E 1968-03-13 nominal: yes custSale: no vlaBuy: no vlaSell: no 27016: 627675E 1968-03-13 nominal: no custSale: no vlaBuy: no vlaSell: no 27571: 634784E 1968-06-18 nominal: no custSale: no vlaBuy: no vlaSell: no 27017: 634747E 1968-07-09 nominal: yes custSale: no vlaBuy: no vlaSell: no 27020: 640865E 1968-09-25 nominal: no custSale: no vlaBuy: no vlaSell: no 27654: 656339E 1969-03-10 nominal: no custSale: no vlaBuy: no vlaSell: no 26949: 658596E 1969-04-09 nominal: yes custSale: no vlaBuy: no vlaSell: yes 26970: 658597E 1969-04-09 nominal: no custSale: no vlaBuy: no vlaSell: no 27930: 660594E 1969-05-02 nominal: yes custSale: no vlaBuy: no vlaSell: no 27649: 675235E 1969-10-30 nominal: yes custSale: no vlaBuy: no vlaSell: no 27936: 683572E 1970-01-27 nominal: no custSale: no vlaBuy: no vlaSell: no 27021: 728793E 1971-09-21 nominal: yes custSale: no vlaBuy: no vlaSell: no 27667: G72055E 1971-10-29 nominal: yes custSale: no vlaBuy: no vlaSell: no 27668: G112249E 1972-02-01 nominal: yes custSale: no vlaBuy: no vlaSell: no 27712: H10857E 1972-02-24 nominal: yes custSale: no vlaBuy: no vlaSell: no 27714: H10856E 1972-02-24 nominal: yes custSale: no vlaBuy: no vlaSell: no 27673: H14697E 1972-02-28 nominal: yes custSale: no vlaBuy: no vlaSell: no 27729: H48509E 1972-05-30 nominal: no custSale: no vlaBuy: no vlaSell: no 27728: H42064E 1972-06-07 nominal: no custSale: no vlaBuy: no vlaSell: no 27730: H58317E 1972-06-20 nominal: no custSale: no vlaBuy: no vlaSell: no 27731: H60992E 1972-07-10 nominal: no custSale: no vlaBuy: no vlaSell: no 27734: H65306E 1972-07-24 nominal: no custSale: no vlaBuy: no vlaSell: no 27733: H90242E 1972-09-13 nominal: no custSale: no vlaBuy: no vlaSell: no 27677: H92378E 1972-09-15 nominal: yes custSale: no vlaBuy: no vlaSell: no 27940: H96413E 1972-09-27 nominal: no custSale: no vlaBuy: no vlaSell: no 27727: H101791E 1972-10-18 nominal: no custSale: no vlaBuy: no vlaSell: no 27759: H124001E 1972-12-06 nominal: no custSale: no vlaBuy: no vlaSell: no 27679: J5100E 1973-01-24 nominal: no custSale: no vlaBuy: no vlaSell: no 27766: J55243E 1973-06-06 nominal: no custSale: no vlaBuy: no vlaSell: no 27710: J71976E 1973-07-04 nominal: no custSale: no vlaBuy: no vlaSell: no 27763: J86568E 1973-07-06 nominal: no custSale: no vlaBuy: no vlaSell: no 27769: K20906E 1974-03-14 nominal: yes custSale: no vlaBuy: no vlaSell: no 27724: K68691E 1974-08-02 nominal: yes custSale: no vlaBuy: no vlaSell: no 27708: K85245E 1974-08-20 nominal: yes custSale: no vlaBuy: no vlaSell: no 27761: L91605E 1975-11-07 nominal: no custSale: no vlaBuy: no vlaSell: no To select T1 for this property: Get the latest non-nominal property FOR THE ORIGINAL BASE PROPERTY before the starting date (1942-03-04): 26675: 74159E 1927-10-16 nominal: no custSale: no vlaBuy: no vlaSell: no To select T2 for this property and its descendants we first look for any custodian seller titles in the period at all, nominal or not: 26811: 169945E 1944-04-17 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26812: 169952E 1944-04-18 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26816: 170828E 1944-04-25 nominal: no custSale: yes vlaBuy: yes vlaSell: no 26763: 171903E 1944-05-18 nominal: no custSale: yes vlaBuy: yes vlaSell: no Since we find some, we take the first in the list as our T2: 26811: 169945E 1944-04-17 nominal: no custSale: yes vlaBuy: yes vlaSell: no To get T3, we now get the next qualifying title (i.e. the next qualifying title which follows the custodian sale. That also happens to be a custodian sale, this time to the VLA: 26812: 169952E 1944-04-18 nominal: no custSale: yes vlaBuy: yes vlaSell: no To get T4, we do the same thing again: get the next qualifying title, which turns out to be another custodian sale: 26816: 170828E 1944-04-25 nominal: no custSale: yes vlaBuy: yes vlaSell: no To get T5, we get the last qualifying transaction on a descendant property which happens before the end date (1972-03-04): 27936: 683572E 1970-01-27 nominal: no custSale: no vlaBuy: no vlaSell: no This sequence makes sense according to the rules and the spreadsheet.
Met with JSR and we looked at the stats work. There appears to be an anomaly around the average 2018 price per metre for Maple Ridge properties in the 42-49 date-range, so we planned a diagnostic for that; I've now written that, and it seems like the anomaly must be due to some specific transactions from the period which don't happen to make it into the lot-based spreadsheet. More checking to do there. Meanwhile, I updated the data dictionary to include the VLA spreadsheet, and the new plan also includes a significant addition to the main lot-based spreadsheet.
Working with GL and GN, created an example TEI encoding for the Zotero data. Needs an ODD file w/schema and documentation, as well as Oxygen project.