In breaks between workshop sessions, I tried to address the problem of the length of time queries are taking by using temporary tables constructed directly in the query instead of joining to views. The approach I took (joining to a temporary table created in the join clause) had no effect -- the query took exactly the same time. But there may be a way to create a temporary table once earlier in the query, and then join to it later. Still working on this.
Reworked VW_owners_full so that it includes more records and data (a regular join changed to a left join):
DROP VIEW IF EXISTS owners_full; CREATE VIEW VW_owners_full AS ( SELECT owners.*, owners_to_ethnicities.ote_ethnicity_id_fk, ethnicities.eth_name, owners_to_titles.ott_title_id_fk AS title_id, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date FROM owners_to_titles LEFT JOIN owners ON owners.own_owner_id = owners_to_titles.ott_owner_id_fk LEFT JOIN titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id order by owners.own_surname )
DROP VIEW IF EXISTS VW_trans_composite_eth_prop; CREATE VIEW VW_trans_composite_eth_prop AS ( SELECT seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code, buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code, DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days, census_tracts.census_tract_code, props.*, sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers, sellers.munged_ethnicity AS seller_munged_eth, sellers.total_institutional AS institutional_sellers, buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers, buyers.munged_ethnicity AS buyer_munged_eth, buyers.total_institutional AS institutional_buyers FROM titles AS buyer_titles LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id LEFT JOIN props AS props ON seller_titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY seller_titles.ttl_title_id )
Revised the VW_titles_composite_eth view so that it includes even owners who have no ethnicity (are institutional):
DROP VIEW IF EXISTS VW_titles_composite_eth; CREATE VIEW VW_titles_composite_eth AS ( SELECT titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, CAST(group_concat(owners.own_owner_id separator ', ') AS CHAR) AS concat_owners, SUM(owners.own_institutional) AS total_institutional, CAST(group_concat(owners_to_ethnicities.ote_ethnicity_id_fk separator ', ') AS CHAR) AS concat_eth_ids, group_concat(ethnicities.eth_name separator ', ') AS concat_ethnicities, (SELECT IF (COUNT(owners.own_owner_id) > 1, 'multiple', 'single')) AS multi_owner, (COUNT(owners.own_owner_id)) AS total_owners, (SELECT CASE WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c" WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b" WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a" WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d" ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1) END) AS munged_ethnicity FROM titles LEFT JOIN owners_to_titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id LEFT JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id GROUP BY titles.ttl_title_id )
The latest requirement is a field containing a comma-separated list of the owner names where the owners are institutional.
JS-R needed to add the full property information to the transactions view, so I've added a new view:
DROP VIEW IF EXISTS VW_trans_composite_eth_prop; CREATE VIEW VW_trans_composite_eth_prop AS ( SELECT seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code, buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code, DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days, census_tracts.census_tract_code, props.*, sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers, sellers.munged_ethnicity AS seller_munged_eth, buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers, buyers.munged_ethnicity AS buyer_munged_eth FROM titles AS buyer_titles LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id LEFT JOIN props AS props ON seller_titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY seller_titles.ttl_title_id )
This takes even longer to generate than the original, unfortunately.
Had to use a separate view to aggregate owners against titles. It's working, but it's mighty slow!
DROP VIEW IF EXISTS VW_trans_composite_eth; CREATE VIEW VW_trans_composite_eth AS ( SELECT seller_titles.ttl_title_id AS seller_title_id, seller_titles.ttl_date AS seller_title_date, seller_titles.ttl_title_code AS seller_title_code, buyer_titles.ttl_title_id AS buyer_title_id, buyer_titles.ttl_date AS buyer_title_date, buyer_titles.ttl_title_code AS buyer_title_code, DATEDIFF(buyer_titles.ttl_date, seller_titles.ttl_date) as seller_duration_days, census_tracts.census_tract_code, sellers.concat_owners AS concat_sellers, sellers.concat_ethnicities AS seller_ethnicities, sellers.total_owners AS total_sellers, sellers.munged_ethnicity AS seller_munged_eth, buyers.concat_owners AS concat_buyers, buyers.concat_ethnicities AS buyer_ethnicities, buyers.total_owners AS total_buyers, buyers.munged_ethnicity AS buyer_munged_eth FROM titles AS buyer_titles LEFT JOIN titles_to_prectitles ON buyer_titles.ttl_title_id = titles_to_prectitles.ttp_title_id_fk LEFT JOIN titles AS seller_titles ON seller_titles.ttl_title_id = titles_to_prectitles.ttp_prectitle_id_fk LEFT JOIN VW_titles_composite_eth AS sellers ON seller_titles.ttl_title_id = sellers.ttl_title_id LEFT JOIN VW_titles_composite_eth AS buyers ON buyer_titles.ttl_title_id = buyers.ttl_title_id LEFT JOIN props ON seller_titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id ORDER BY seller_titles.ttl_title_id )
I've completed two of the three tasks from this morning: the VW_sellers_duration view has been modified to rename its fields, and the VW_mortgate_ethnicity view has been created.
DROP VIEW IF EXISTS VW_mortgage_ethnicity; CREATE VIEW VW_mortgage_ethnicity AS (SELECT mortgages.*, titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, census_tracts.census_tract_code, CAST(group_concat(owners.own_owner_id separator ', ') AS CHAR) AS concat_owners, CAST(group_concat(owners_to_ethnicities.ote_ethnicity_id_fk separator ', ') AS CHAR) AS concat_eth_ids, group_concat(ethnicities.eth_name separator ', ') AS concat_ethnicities, (SELECT IF (COUNT(owners.own_owner_id) > 1, 'multiple', 'single')) AS multi_owner, (COUNT(owners.own_owner_id)) AS total_owners, (SELECT CASE WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c" WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b" WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a" WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d" ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1) END) AS munged_ethnicity FROM mortgages LEFT JOIN mortgages_to_titles ON mortgages.mgg_mortgage_id = mortgages_to_titles.mtt_mortgage_id_fk LEFT JOIN titles on mortgages_to_titles.mtt_title_id_fk = titles.ttl_title_id LEFT JOIN owners_to_titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id LEFT JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id LEFT JOIN props ON titles.ttl_property_id_fk = props.prp_property_id LEFT JOIN census_tracts ON props.prp_census_tract_id_fk = census_tracts.census_tract_id GROUP BY mortgages.mgg_mortgage_id)
I'm now working on the transaction view, but I've run into an issue variously known as the "Cartesian product" or "Cascading aggregates" problem: in simple terms, when I aggregate both the buyers and the sellers in a single row, I end up with multiple instances. For example, if there are two buyers and three sellers, then each buyer appears three times and each seller twice, for spurious totals of six buyers and six sellers.
What this means is that my strategy for munging buyers OR sellers works only when we're only looking at one of them; when we need to munge both of them in the same row, a different strategy has to be adopted. Right now, I don't know what that strategy might be, but I'm doing some research.
Met with JS-R and worked through requirements for new views of the data:
Started experimenting with building a view of the titles table which will include a sort of composite ethnicity column, as detailed in the preceding post. I now have the key owner and ethnicity data munged together in a view created like this:
CREATE VIEW VW_titles_composite_eth AS (SELECT titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, CAST(group_concat(owners.own_owner_id separator ', ') AS CHAR) AS concat_owners, CAST(group_concat(owners_to_ethnicities.ote_ethnicity_id_fk separator ', ') AS CHAR) AS concat_eth_ids, group_concat(ethnicities.eth_name separator ', ') AS concat_ethnicities, (SELECT IF (COUNT(owners.own_owner_id) > 1, 'multiple', 'single')) AS multi_owner, (COUNT(owners.own_owner_id)) AS total_owners, (SELECT CASE WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" THEN "c" WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Japanese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "b" WHEN group_concat(ethnicities.eth_name separator ', ') LIKE "%Chinese%" AND (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "a" WHEN (SELECT COUNT(DISTINCT(ethnicities.eth_name))) > 1 THEN "d" ELSE SUBSTRING_INDEX(group_concat(ethnicities.eth_name separator ', '), ',', 1) END) AS munged_ethnicity FROM titles JOIN owners_to_titles ON owners_to_titles.ott_title_id_fk = titles.ttl_title_id JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk JOIN ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = ethnicities.eth_ethnicity_id GROUP BY titles.ttl_title_id)
This has the following values, as previously requested:
JS-R has made the following request for a new column in the titles table, which would handle a kind of munged-together value for all the owners' ethnicities. It would be a string field, presumably (unless we want to give these numbers to keep them short).
Maybe we need two new variables as characteristics of titles:
1. Multiple or single owner:
a. multiple
b. single
2. Ownership ethnicity (don't worry here whether they are individuals or groups):
a. Mixed ethnicity with Chinese
b. Mixed Ethnicity with Japanese (include Japanese provisional)
c. Mixed ethnicity with Chinese and Japanese
d. Mixed ethnicity--ALL Others (all not included above, including mixes that include the Asia: other category but neither Japanese or Chinese, if they include both Asia other and Japanese or Chinese, categorize as indicated above)
e. ELSE--> copy the single ethnicity (or shared ethnicity) here
It turns out that this one should be buyers instead of vendors: Vendor ethnicity by mortgage institutional or not (we will fill in this data as we go) (with analysis possible by date and by sample). So here we go with:
Buyer ethnicity by mortgage institutional or not (we will fill in this data as we go) (with analysis possible by date and by sample)
Here's the SQL:
CREATE VIEW VW_buyers_and_lenders AS (SELECT titles.ttl_title_id, titles.ttl_title_code, titles.ttl_date, owners.own_owner_id, owners.own_surname, owners.own_forenames, buyer_ethnicities.eth_name AS buyer_ethnicity, mortgages.mgg_mortgage_id, mortgages.mgg_doc_num, mortgages.mgg_value, mortgages.mgg_interest, lenders.lnd_display_name, lenders.lnd_individual, lender_ethnicities.eth_name AS lender_ethnicity FROM titles JOIN owners_to_titles ON titles.ttl_title_id = owners_to_titles.ott_title_id_fk LEFT JOIN owners ON owners_to_titles.ott_owner_id_fk = owners.own_owner_id LEFT JOIN owners_to_ethnicities ON owners.own_owner_id = owners_to_ethnicities.ote_owner_id_fk LEFT JOIN ethnicities AS buyer_ethnicities ON owners_to_ethnicities.ote_ethnicity_id_fk = buyer_ethnicities.eth_ethnicity_id JOIN mortgages_to_titles ON titles.ttl_title_id = mortgages_to_titles.mtt_title_id_fk LEFT JOIN mortgages ON mortgages_to_titles.mtt_mortgage_id_fk = mortgages.mgg_mortgage_id LEFT JOIN lenders_to_mortgages ON mortgages.mgg_mortgage_id = lenders_to_mortgages.ltm_mortgage_id_fk LEFT JOIN lenders ON lenders_to_mortgages.ltm_lender_id_fk = lenders.lnd_lender_id LEFT JOIN lenders_to_ethnicities ON lenders.lnd_lender_id = lenders_to_ethnicities.lte_lender_id_fk LEFT JOIN ethnicities AS lender_ethnicities ON lenders_to_ethnicities.lte_ethnicity_id_fk = lender_ethnicities.eth_ethnicity_id)
I've included lender ethnicities and lender "individual", even though no data has been entered for those yet.
We now have six views, which I've renamed to make them more obviously views in the phpMyAdmin interface:
I think these cover all the current needs. VW_owners_full is just a view of all the core data about owners; I was intending to use it as part of other views, but didn't need to. I'll leave it there in case it's useful.