Created a new spreadsheet for MO at JSR's request, based on titles sold by the Custodian and their preceding titles. May need to do some additional map work, to integrate multiple existing JSON files to create a single one for each row. 180 minutes.
The FODS tables to TEI tables conversion is now working; a quick spot check seems that the results are correct, but we'll need to run some diagnostics on it both to ensure that the conversion worked correctly and to identify any oddities in the source data. Added a bunch of documentation, too.
Spent some time evaluating the spreadsheets and thinking through the best process for turning them into RADish. Got a basic build set up with a conversion process. So far, the process looks like so:
- Copy the files into a temporary directory and, in doing so, clean up the filenames (no brackets, no spaces)
- Take those files and convert them to FODS using soffice (there's lots there so it takes a while)
- Then, take those FODS and convert them into a TEI table; I'm still working on this bit, but, so far, I have a process whereby the XSLT gets all the fods in a particular directory (using collection()) and then combines them into TEI document with multiple tables. This might not be the right approach in the long run, but I think it makes the most sense for now, particularly if the various spreadsheets in a collection need to be reconciled
Coded the third new spreadsheet, which wasn't as straightforward as the others. Updated documentation and data dictionary. The final spreadsheet has been dropped from the plan because it doesn't actually make sense.
After some debugging, I've got the last-custodian and no-custodian-t3 versions of the spreadsheets working (I think -- Jenkins is running now), and I've mapped out how the third variant should work. That third variant, as it's currently specified, makes no sense at all to me, though, so I've written to JSR to get some clarification before I put in the considerable amount of work necessary to create it. Meanwhile I can work on the new titles spreadsheet.
Parameterized a couple of functions and implemented the changes required to get #3 on plan 9 done (that's the simplest). Running the test now.
Today was finalizing plan #8 and completing it (more or less); then formulating plan #9:
*********************** *** Eighth plan modification: *** 1. DONE: Test the assumption that Maple Ridge properties were heavily transacted between the T2 in the current spreadsheet and 1949-04-01. 2. DONE: Add a column to the lot spreadsheet which shows the full consideration in 2018 dollars for each title. E.g. T1_CONSIDERATION_2018. 3. DONE: In the lot-based spreadsheet, add a T0, which is the last non-nominal transaction prior to T1 which includes an ancestor property of the root property (for all rows in the spreadsheet). 4. DONE: Add documentation of the VLA spreadsheet to the data dictionary. 5. DONE: Specify end date for T5s in data dictionary. 6. DONE: Fix the sequencing of titles in the BreezeMap output. (CAVEAT: there appears to be a small subset of titles for which this is not working, including 26684. I can't figure out why yet, but it seems to be working fine for the vast majority, so unless it's a priority I'll put it aside for now.) *********************** This is plan #9, for your final approval: *********************** *** Ninth plan modification: *** 1. Rename the original lot spreadsheet property_stats_data_by_lot_2018_orig. 2. Create a post-custodian sales spreadsheet of titles (property_stats_data_by_title_post_custodian_2018) like this: a) Get all custodian sales. b) For each custodian sale title, get the properties. c) For all those properties, get descending properties. d) Get all titles covering that expanded set of properties. e) Filter those titles to exclude any which pre-date the original custodian sale. f) Add the remaining titles to a list. g) For each distinct title in the list, ordered by title id, output a row. QUESTION: Do you care about nominal versus non-nominal in this spreadsheet? 3. Create a new version of the original lot spreadsheet (property_stats_data_by_lot_2018_no_cust_T3) in which T3 is never allowed to be a custodian sale. Call this the no-custodian-T3 rule. 4. Create another new version of the original lot spreadsheet (property_stats_data_by_lot_2018_last_cust) in which we always use the last transaction by the custodian rather than the first (the last-custodian rule). 5. Create a new version of the current lot spreadsheet (property_stats_data_by_lot_2018_strict_descent) where the descendant-property scenario is modified such that at each stage, we only retrieve descendants of properties in the preceding stage, rather than any descendant of the ur-property. QUESTION: does it matter that this will inevitably give rise to chains where the later titles have no land in common with the earlier titles? QUESTION: should this spreadsheet use the no-custodian-T3 rule? QUESTION: should this spreadsheet use the last-custodian rule? *******************