Extraneous space in an answer in Unit 16 of Wheelock. Reported by user, and fixed.
Worked out reasonably clean ways of presenting the table type 1 and type 2 data for multiple years/tract sets/ethnicities, and then started implementing the first table. I have the code basically working, but I'm not sure about how/whether I'm correctly excluding data when the counts for particular ethnicities are zero. I need to do some detailed testing. The original code by JD does appear to exclude tracts where the count for a particular ethnicity is 0; if that's true (i.e. if I'm not misunderstanding the original code/calculation) then it seems odd to me. On the other hand, in order to avoid divide-by-zero errors in some contexts, you have to ensure that sets which have a zero count are not actively calculated. I'll set up some detailed testing tomorrow, and see if I can produce any conditions where I get zero conditions when the data actually contains individuals from one of the ethnicities in the grouping, in one of the tract sets in the grouping.
A request for 6 documents not in the 1858 public collection sent me searching for them to save pages as HTML which I could send as a zip file. One was hard to locate because the schedule item was not correctly linked (now fixed); it, and another in the same year, also don't appear in the document sequence where they should -- or perhaps not at all -- which suggests a problem with the algorithm that generates the document sequence. I'll look into that. Meanwhile, I want to track any time spent answering queries about non-1858 stuff in case it eats too much into development time. Tagging appropriately.
Lots of stuff to do, and not enough hours in the day...
The Globe and Mail has a great story on the Colonial Despatches project today.
Did the image renaming, adding two leading zeroes that were missing from the new images from CO 60 03. For future reference, the fastest way to do it is at the command line on the server, using rename co_60_03_ co_60_03_00 *.jpg.
Now, the image linking itself is dependent on a file in the db called scan_images.xml. That file is a list of the image files which exist. This seems to have been generated from a straight file listing; I don't know how I generated that, but there are various ways to do it. Then the file listing was run through a Transformer operation called scan_file_list_to_xml.seq.xml. Once we have the XML file (it has a TEI framework based on <facsimile>), it can be uploaded into the db. Following that, there's a file called adding_page_scan_links.xquery which contains the XQuery code to run against the database. That code checks for any documents which don't have page scan links, but for which page scans exist, and inserts them.
The relevant files are sitting in my coldesp root and xml folders. So now that's documented, and I won't have to figure it out again.
Went to a planning meeting for the project, following on from the Launch. Nothing for public report.
Finally clued in that the new co_60_03 files have the same problem as the original 130-odd did: they're named with one zero too few. I'll now have to rename all the new ones. Flipping heck.
Created the 3 sizes of JPEG based on the 1312 new TIFFs. This was done with PhotoShop, using Actions I'd already created on the Mac (File / Automate). The sequence is: turn tiffs to jpegs, then turn full-size jpegs to 800px versions, then full-size to 60px thumbnails.
Pushed the results up onto the server in the coldesp account, along with the others. Now I have to try to remember what's necessary, if anything, to make these new images integrate into the system. I seem to have failed to blog that in enough detail for myself, so I'll make sure I do blog it as soon as I've figured it out.
Wrote a couple of new functions that retrieve data for ethnicities and combine it in hash tables organized by tract, giving us a sort of pseudo-table against which we can do the new calculations. Did the math on the same test data I used the other day (Sherbrooke 1961, Irish and Scottish combination) and got a substantially different result, which I think is probably what we really want. Sent my results over to JS-R for him to confirm that we're now doing the right thing.
If this is correct, then it should be all we need to create the two remaining calculations (xP*y, the Interaction Index, or exposure of one group to another; and Adjusted P*, the Interaction Index expressed as a proportion). Then it's just a matter of building tables.