Created a new view for sellers with full details of the transaction, but also including ownership duration:
CREATE VIEW VW_sellers_duration AS (SELECT old_titles.ttl_title_id AS old_title_id, old_titles.ttl_date AS old_title_date, old_titles.ttl_title_code AS old_title_code, new_titles.ttl_title_id AS new_title_id, new_titles.ttl_date AS new_title_date, new_titles.ttl_title_code AS new_title_code, DATEDIFF(new_titles.ttl_date, old_titles.ttl_date) as duration_days, census_tracts.census_tract_code, sellers.own_owner_id AS seller_id, sellers.own_surname AS seller_surname, sellers.own_forenames AS seller_forenames, seller_eth.eth_name as seller_ethnicity FROM titles AS new_titles JOIN titles_to_prectitles ON new_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk JOIN titles AS old_titles ON old_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk JOIN owners_to_titles AS sellers_to_titles on old_titles.ttl_title_id = sellers_to_titles.ott_title_id_fk JOIN owners AS sellers ON sellers_to_titles.ott_owner_id_fk = sellers.own_owner_id LEFT JOIN owners_to_ethnicities AS sellers_to_ethnicities ON sellers.own_owner_id = sellers_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities AS seller_eth ON sellers_to_ethnicities.ote_ethnicity_id_fk = seller_eth.eth_ethnicity_id LEFT JOIN props ON old_titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id)
This slightly amended version gets all buyers linked to a title even where other info is missing:
SELECT buyers.own_owner_id AS buyer_id, buyers.own_surname AS buyer_surname, buyers.own_forenames AS buyer_forenames, buyer_eth.eth_name AS buyer_ethnicity, census_tracts.census_tract_code, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, titles_to_prectitles.ttp_prectitle_id_fk, sellers.own_owner_id AS seller_id, sellers.own_display_name AS seller_name, seller_eth.eth_name AS seller_ethnicity FROM owners AS buyers LEFT JOIN owners_to_titles on buyers.own_owner_id = owners_to_titles.ott_owner_id_fk LEFT JOIN titles on owners_to_titles.ott_title_id_fk = titles.ttl_title_id LEFT JOIN titles_to_prectitles ON titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk LEFT JOIN owners_to_ethnicities ON buyers.own_owner_id = owners_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities AS buyer_eth ON owners_to_ethnicities.ote_ethnicity_id_fk = buyer_eth.eth_ethnicity_id JOIN owners_to_titles AS sellers_to_titles ON titles_to_prectitles.ttp_prectitle_id_fk = sellers_to_titles.ott_title_id_fk LEFT JOIN owners AS sellers ON sellers_to_titles.ott_owner_id_fk = sellers.own_owner_id LEFT JOIN owners_to_ethnicities AS sellers_to_ethnicities ON sellers.own_owner_id = sellers_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities AS seller_eth ON sellers_to_ethnicities.ote_ethnicity_id_fk = seller_eth.eth_ethnicity_id LEFT JOIN props ON titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY buyers.own_surname
Here are the latest requests:
I have code elsewhere on the blog which is close to the second of these. The first is more convoluted.
The next view is this one:
Vendor ethnicity by buyer ethnicity (with analysis possible by date and by sample)
For this we need an individual "transfer" record which includes the following columns:
I could start by creating a comprehensive view of owners, though; I could then join to this view in the creation of other views, making the process a bit more modular. Here's the sql:
DROP VIEW IF EXISTS owners_detail CREATE VIEW owners_full AS (SELECT owners.*, owners_to_ethnicities.ote_ethnicity_id_fk, ethnicities.eth_name, owners_to_titles.ott_title_id_fk AS title_id, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date FROM (((owners RIGHT JOIN owners_to_titles ON owners.own_owner_id = owners_to_titles.ott_owner_id_fk) JOIN titles on owners_to_titles.ott_title_id_fk = titles.ttl_title_id) JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk) JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id ORDER BY owners.own_surname)
However, I ended up building the entire thing as a single view, like this:
DROP VIEW IF EXISTS buyers_and_sellers CREATE VIEW buyers_and_sellers AS (SELECT buyers.own_owner_id AS buyer_id, buyers.own_surname AS buyer_surname, buyers.own_forenames AS buyer_forenames, buyer_eth.eth_name AS buyer_ethnicity, census_tracts.census_tract_code, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, titles_to_prectitles.ttp_prectitle_id_fk, sellers.own_owner_id AS seller_id, sellers.own_display_name AS seller_name, seller_eth.eth_name AS seller_ethnicity FROM owners AS buyers JOIN owners_to_titles on buyers.own_owner_id = owners_to_titles.ott_owner_id_fk JOIN titles on owners_to_titles.ott_title_id_fk = titles.ttl_title_id JOIN titles_to_prectitles ON titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk JOIN owners_to_ethnicities ON buyers.own_owner_id = owners_to_ethnicities.ote_owner_id_fk JOIN ethnicities AS buyer_eth ON owners_to_ethnicities.ote_ethnicity_id_fk = buyer_eth.eth_ethnicity_id JOIN owners_to_titles AS sellers_to_titles ON titles_to_prectitles.ttp_prectitle_id_fk = sellers_to_titles.ott_title_id_fk JOIN owners AS sellers ON sellers_to_titles.ott_owner_id_fk = sellers.own_owner_id JOIN owners_to_ethnicities AS sellers_to_ethnicities ON sellers.own_owner_id = sellers_to_ethnicities.ote_owner_id_fk JOIN ethnicities AS seller_eth ON sellers_to_ethnicities.ote_ethnicity_id_fk = seller_eth.eth_ethnicity_id JOIN props ON titles.ttl_property_id_fk = props.prp_property_id JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY buyers.own_surname)
Right now this brings back 500 records, so I assume there are only 500 instances where we have a transfer from one person to another in which we know both the buyer and the seller. However, I need to check this carefully; I'll also probably have to loosen up the query to get us transfers for which less info is available.
The next view I've completed is the following:
Total transfers by property (sortable by decade and by sample)
because we have all the data for it.
SELECT props.*, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code FROM (`props` RIGHT JOIN titles on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY titles.ttl_date
SELECT props.*, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code, titles_to_prectitles.ttp_prectitle_id_fk AS `prectitle_id` FROM ((`props` RIGHT JOIN titles on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id) RIGHT JOIN titles_to_prectitles on titles_to_prectitles.ttp_title_id_fk = titles.ttl_title_id ORDER BY titles.ttl_date
DROP VIEW IF EXISTS trans_by_property CREATE VIEW trans_by_property AS (SELECT props.prp_property_id, props.prp_property_code, props.prp_district, props.prp_plan, props.prp_block, props.prp_lot, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code, titles_to_prectitles.ttp_prectitle_id_fk AS `prectitle_id` FROM ((`props` RIGHT JOIN titles on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id) RIGHT JOIN titles_to_prectitles on titles_to_prectitles.ttp_title_id_fk = titles.ttl_title_id ORDER BY props.prp_property_id)
Sent an XLS dump of this view to JS-R, and also created a user for him so he can read and export views himself.
We're beginning the process of creating specific views which can be used for statistical analysis of the data. This is the set of views requested by JS-R:
Now we have to formalize those in terms of the various tables and fields that need to be involved. This is my first attempt at the first view:
CREATE VIEW trans_by_date AS SELECT titles_to_prectitles.ttp_ttp_id, titles.ttl_title_id, titles_to_prectitles.ttp_prectitle_id_fk, titles.ttl_title_code, titles.ttl_date, props.prp_census_tract_id_fk, census_tracts.census_tract_code FROM ((titles_to_prectitles JOIN titles on titles_to_prectitles.ttp_title_id_fk = titles.ttl_title_id) JOIN props on titles.ttl_property_id_fk = props.prp_property_id) JOIN census_tracts on props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY titles.ttl_date
With help from JN, this is the SQL that creates links where there is a match:
INSERT INTO `titles_to_prectitles` (`ttp_title_id_fk`, `ttp_prectitle_id_fk`) SELECT `tt2`.`ttl_title_id` AS `id_1` , `tt1`.`ttl_title_id` AS `id2` FROM `titles` AS `tt1` INNER JOIN `titles` AS `tt2` ON tt1.ttl_title_code = tt2.ttl_traces
This checks that there are no duplicate records in the linking table (in case the manual data entry had already created one of the ones we've just created):
SELECT ttp1.ttp_ttp_id, ttp2.ttp_ttp_id FROM titles_to_prectitles ttp1, titles_to_prectitles ttp2 WHERE ttp1.ttp_title_id_fk = ttp2.ttp_title_id_fk and ttp1.ttp_prectitle_id_fk = ttp2.ttp_prectitle_id_fk and ttp1.ttp_ttp_id < ttp2.ttp_ttp_id
This clears out the Traces field wherever these links have been created:
UPDATE titles ttl1 SET ttl_traces = "" WHERE ttl_traces IN (SELECT * FROM (SELECT ttl_title_code FROM titles ttl2) as codes)
Note that the nested subquery is necessary because MySQL won't allow you to use the table you're updating in a WHERE clause.
These are the results:
From a total of 2274 titles:
1115 titles still have data in their Traces field. Some of these items might be linkable by a human. For instance:
Obviously these sorts of decisions can only be made by a human.
This simpler query shows up more potential hits:
SELECT * FROM `titles` AS `tt1` WHERE EXISTS ( SELECT * FROM `titles` WHERE tt1.ttl_title_code = titles.ttl_traces )
This query brings back the candidate ids:
SELECT `tt1`.`ttl_title_id` AS `id_1`, `tt2`.`ttl_title_id` AS `id2` FROM `titles` AS `tt1` LEFT JOIN `titles` AS `tt2` ON tt1.ttl_title_code = tt2.ttl_traces
We'll put more work into this tomorrow.
Came to automate the linking of titles to preceding titles as specified in the previous post, but found that with this data, manual intervention was essential. I initially isolated 38 candidate records like this:
SELECT `ttl_title_id` FROM `titles` AS `ts1` WHERE NOT EXISTS (SELECT * FROM `titles_to_prectitles` WHERE `ttp_title_id_fk` = `ts1`.`ttl_title_id`) AND EXISTS (SELECT `ttl_title_id` FROM `titles` AS `ts2` WHERE TRIM(`ts1`.`ttl_traces`) = `ts2`.`ttl_title_id`) ORDER BY `ts1`.`ttl_title_id`
But then I looked at each candidate individually, because in many cases, there was only a partial entry in the Traces field. For instance, looking at a record with "2144K" in the Traces field, we find there are two possible candidates for the precedint title, 2144K-19 and 2144K-21. We can select the correct one by checking the Property field, so (for instance) we can tell that if the Property field for the candidate record has 254 B:39 L:19, and 2144K-19 also has 254 B:39 L:19, then that's the correct preceding title.
With this method I was able to match nearly all the 38 candidates to the correct preceding record(s).
We end up with the following:
Don't know how useful this will be.
The manual work on preceding titles has been done, so the automated component can now be undertaken. This is the task:
Caveat: there are some items where there are brackets around one or more characters; we'll probably have to ignore those (and lookups will fail anyway).