CityStats: First table output working; many hurdles
Posted by mholmes on 09 Dec 2008 in Activity log
Spent the whole day on this, and I've got as far as making the new D-Table output. The other data is also being calculated, but not output yet. Here are some of the annoyances and hurdles:
- Temporary tables in MySQL cannot be addressed twice in one operation, so they're useless for our purposes. Therefore I have to create full tables to do our queries on, and delete them explicitly at the end. This has the useful side-effect, though, that they're still around to be examined if the process dies in the middle.
- It turns out that my original plan for calculating values on a much-reduced table consisting only of those ethnicities we're grouping will not really work; the other ethnicities need to be there, so that total values can be calculated involving tracts when there may not be any residents from one of our groupings in that tract. Therefore the temporary table construction process is much more complicated than I thought, involving a union of the summed/grouped tract rows for our grouped ethnicities with all of the rows which are NOT from those ethnicities which are grouped. The resulting tables can be very large, of course, amounting to almost full copies of the original tables, and querying them takes a while, so it then makes sense (and saves a lot of time) to create indexes on them, just as on the regular tables. However, once this is done, the processing speed is considerably enhanced.
- My new calculations, which are based entirely on reconstructed tables which mimic the originals, along with calculation code based on JD's, are showing up results which are basically the same as those in my original complicated D-table calculation, which JS-R thought might be off. We need to look at this. But the fact that I've done the same calculations two completely different ways and gotten the same result suggests that the original might have been right after all.
- The ordering of the D-table cells may not be right yet (either that or the ordering of my original table wasn't right). In the second column group of a two-city result table, the order of row data for the ethnicities is switched around compared to the original table. That shouldn't be too hard to figure out, though.
- Instead of using negative indices for the grouped ethnicities, I've elected instead to start them from 10,000, which is way above the possible limits in the existing data. It makes for easier access and sorting.
Stayed a bit late to get this working, but I'd been out of the office for an hour and a half in the morning, so we're square. I'm really looking forward to getting this finished off in the next few days; the code needs a lot of old and debugging functions removing from it, once I don't need them any more, and then it will be easier to read.