Ongoing minor adjustments to the AtoM interface per NY, and a new field in the db for the Ras in Maple Ridge.
Had to do a bit of research to answer some questions from the AtoM team; one is still outstanding, since I've asked on the AtoM group (how to delete an item from the Level of Description field).
Added three new layers to the Read map from images supplied by the GIS team; two of them really should be vectors, so we're going to look at the possibility of building a real GIS map rather than a Zoomify thing. Also added the Police ID field to the Owners table in the Maple Ridge db (tested in dev, ported to live). Put together a proposal for the "bundle" documents we discussed the other day and circulated it. Looks messy and may need two new tables.
Our current task involves trying to merge the information about building size and usage, which GIS has gathered and organized by partition, with buyer/seller info from the titles data, which is organized by lot/title.
I've done a preliminary merge of the building usage data in a processed version of one of JSR's SPSS spreadsheets. You'll see that each row has four new columns:
BUILDING_USAGE_1913 BUILDING_USAGE_1930 BUILDING_USAGE_1944 BUILDING_USAGE_1956
Each cell in that column contains the amalgamated building usage data for every partition that is at least partly in that lot, for that year.
Caveats: I haven't had a chance to do more than a cursory check that the values match those I'd expect to see from the GIS folks' spreadsheets; and there is no way of knowing whether we have block/lot uniqueness problems for this particular data set. If two distinct properties have the same block and lot, their data would be amalgamated and we wouldn't know.
This is done by a multi-stage process, using XSLT, in the svnrepo/trunk/maps/gis_xml folder:
- Take the CSV from JSR and save it as FODS, then expand the FODS to make it more easily processable (I do this in lots of projects; it involves turning single table:table-cell elements with a multi-column span into individual copies, so it's easier to iterate through them).
- Take the four GIS files in CSV and turn them into expanded FODS in the same way.
- Take the expanded GIS files and merge them into a single XML file, in a custom format, expanding each multi-lot partition record into a separate record for each lot. This is needed because they record information for each partition only once, but partitions frequently cover multiple lots.
- Iterate through the SPSS spreadsheet, and for each year, look up all the partitions which are in or overlap that lot, and retrieve their usage data; then merge it, eliminating duplicate values.
Having got this far, it will be relatively easy to add other info such as building name (stuff like "dwelling" and "cafe"), and area; but the problem with area is that we have no mechanism for distributing the area of multilot partitions across the lots.
Awaiting feedback from JSR before proceeding any further.
Spent most of the day working on trying to merge original GIS cluster data I have in XML from JC from a couple of years ago with JSR's current spreadsheet data; it's feasible, but it turns out that data is a bit ropy (errors and inconsistencies), and that it's actually missing some features and changes they made later. Meanwhile, they're only (so far) able to give me a big dump of an Access db with many tables, which doesn't seem to contain the data we're looking for (although it might, because even in Access it seems impossible to search it in any useful way). Hopefully they can get me what I need out of Arc, but if not, there's a long and tedious job ahead wrangling this horrible format.
Notes from Landscapes Skype:
Problem: Legal description changes on a subsequent document. Example is addition of a parcel number on a later title, where the same descriptor is added to the plan, presumably for the purposes of clarification. Decision: Add the parcel number in the parcel field, and then make a note to the effect that the parcel descriptor was added at the time of a specific title. Add a checkbox to the properties table: "Property description changed."
Problem: Changes to institutional information over time, or changing descriptions of institutional information, do show up. Should we keep the same record or create new ones, as we do with owners? Decision: make distinct records for each variant of an institution.
Problem: Institutions with people attached to them: should we have a "named agent" field in the institutional owner table? A simpler solution is just to add the agent name in parentheses in the institution name field.
New table for bundle documents? Still under discussion. I will put together one or more proposals for this table.
Add a field to the owners table for Police ID # (for Japanese Canadians).
I've given SA and SB read-access to the new Maple Ridge database, using the new read-only interface I added last semester. This'll be its first live use. Basically I have one .htaccess file on the root folder of the project giving access to both editors and readers, and another on the editors folder, giving only the editors access. The readers don't have access to the editors' credentials, so can't do anything but read, and their interface presents only read components and no editing facilities.
This will enable them to see more clearly what the data for Maple Ridge looks like as it comes in.
Previous discussions of the alignment and unique identification of properties across the GIS and Land Titles clusters, for Powell Street, settled on the use of Block, Lot and Plan. However, looking at the actual data in the Properties table, it's clear that Block is rarely present, and the descriptor information is spread out across eleven fields, with different properties using their own idiosyncratic combinations of those fields. Obviously what worked for Powell Street will be hopeless here.
Since we're right at the beginning of the process, I'd really like to get a system set up so that both clusters are using the same identifiers. There are three possibilities that I see:
- Whenever the GIS folks are mapping a property, they look at the Land Titles database to find that record, and use the id field (the numeric identifier at the far left of the table) as their identifier. This id is guaranteed to be unique, and is never repeated (so if a property is deleted, its id is never re-used, it's retired by the database).
- ADVANTAGES: We're all on the same page, and uniqueness involves no fancy calculation or combination of fields.
- DISADVANTAGES: The GIS folks won't be able to assign an id to their property shapes until the Land Title folks happen to have entered that property in the db.
- We devise some ugly conditional combination of the eleven fields constituting the descriptor, and agree to use that. We're already doing something like that in order to make it possible for our folks to select the correct property when assigning properties to titles.
- ADVANTAGES: It would be to some extent human-readable (i.e. you could figure out what it means by looking at it).
- DISADVANTAGES: It would be long, and it would be very variable, depending on what fields are filled in. Also, there's no guarantee it would actually be unique; the Land Titles folks would have to notice duplication in the drop-down lists they use, and fix it by adding an additional descriptor (we've just worked out a protocol for this).
- The GIS folks themselves devise unique identifiers for their shapes, based on an algorithm of their own, and then periodically one member of each team works through the properties in the database and adds the GIS identifier for each property to a new field in the Properties table.
- ADVANTAGES: GIS can proceed in generating their own ids without looking at the Land Titles data. Having people from the two teams work together periodically might be good for integration as a whole.
- DISADVANTAGES: Potential for id duplication, since the database is not policing uniqueness of id; requirement for periodic collaboration might be onerous and take some setting up, so it might be neglected and the alignment between datasets fall behind.
I'm rather in favour of #1. Waiting to hear back from everyone after laying out these options.
I amended the script described here to add an identical dump of the new Maple Ridge database, and tested processing of it into more usable versions of XML. Everything works fine except that the property descriptions do not naturally convert themselves to the Block_Lot format we've been using up to now; most don't have block or lot descriptors. We'll need to figure out an alternative, and we should probably do that with the GIS folks involved.
I had neglected to add one new feature requested by the RAs: a more detailed auto-generated desc field for the lawyers table. Took advantage of this to test the live-to-dev script to copy data over, implemented and tested in the dev db, and then ported to live. Also regenerated all the existing descs manually.
Per JSR, based on the table here, I calculated the new conversion rates for each year to 2016 values (divide the Jan 1 CPI for 2016 by the CPI for the target year) and built the results into the build process. The old 2014 modules are still there, but the build process should now use the 2016 modules and produce a file which is named for 2016.