CityStats: we have a plan...
For the last couple of days, I've been thinking about the best way to optimize the existing D query, which is a bit slow, and also to enable us to do the other two calculations we need to do for our combined ethnicities without making the whole show painfully slow. I now have a solution, using temporary tables.
The SQL block below will do the following:
- Create a temporary table for one city/year, which is deleted when it goes out of scope (i.e. when the current connection is terminated).
- Populate the table by creating two constructed ethnicities, one by combining ethnicities 23 and 24, and the other combining 35 and 36.
- For each combined ethnicity, in each of the specified tracts, create a new row which sums the
o_tot
values in the original table.
The effect of this is to give us a new table in which the new, constructed ethnicities are represented in sum, as if they were original ethnicities. This table is analogous to any of the original tables -- it has the same structure -- and therefore JD's original, slick and fast SQL can be run against this table to do the three calculations we need to do. The value of o
in the code below is negative, to distinguish it from any of the original, genuine ethnicity id codes. It would also be straightforward to add a new column in the constructed table which contains the user-supplied name for the constructed ethnicity, making it easy to retrieve in the subsequent query.
I now have to take this code forward by first using this approach to create the existing D table, alongside the one created by my current code; ensure that the values resulting from this approach are the same as the old ones; and then test each approach to confirm what I'm confident will be the case, that the new approach is significantly faster.
This is the SQL:
CREATE TEMPORARY TABLE `test` ( SELECT SUM( `o_tot` ) AS `o_tot` , "-10" AS `o` , `ct` , `ct_tot` FROM `ct_1961_205_sc` WHERE (`o` IN ("22", "23")) AND `ct` IN ("1.000", "2.000") GROUP BY `ct` ) UNION ( SELECT SUM( `o_tot` ) AS `o_tot` , "-20" AS `o` , `ct` , `ct_tot` FROM `ct_1961_205_sc` WHERE (`o` IN ("35", "36")) AND `ct` IN ("1.000", "2.000") GROUP BY `ct` ); SELECT * FROM `test`;
The last bit, run in phpMyAdmin, shows the temporary table contents.