Category: "Activity log"

March 1, 2018

prepended public. to table names

Posted by on 01 Mar 2018 in Activity log

Modified about three dozen files to explicitly set the postgres schema by prepending "public." mentions of table names in SELECT etc. queries. Addresses a possible security concern. Modest testing suggests no problems have been introduced, but if something pops up later, search for "public." will find the instances. No other modifications were made to the files.

October 3, 2014

tidy up prod, dev, backup and postgres client

Posted by on 03 Oct 2014 in Activity log

Before making the recent changes to the VIHistory site, I realized my dev, svnrepo backup and production instances were out of sync so I spent a couple of hours sorting all that out.

Also, due to recent security problems on some apache servers, the pgadmin is no longer accessible, so I installed an app pg3admin and configured it to ssh to the postgres server so I could do basic admin through that interface.

Teamsters, coachmen sorted out

Posted by on 03 Oct 2014 in Activity log

There are three records in the occupation table that kind of overlapped: 1 occupation_description = Coachman occupation_code = 54095 2 occupation_description = Coachman occupation_code = 98620 3 occupation_description = Teamster, Hackdriver, Drayman occupation_code = 98621 I changed #2 so that the occupation description is "Teamster" and removed "Teamster, " from #3. Now up to PD to ensure that the affected records in fact have the appropriate values. (I suspect only the relatively small number of records in #1 and #3 are candidates, as this whole task was initiated by records of #2 displaying "Coachman" instead of "Teamster".

Year-sensitive subdistricts in popup window

Posted by on 03 Oct 2014 in Activity log

In the searchcensus page, PD wanted the list of subdistricts that appeared in the popup list to be sensitive to which census year check boxes were selected. I modified the showDataListYear function in the script/showdatalist.js file to check the status of those checkboxes and then added an argument to the call to datalist.php to send that information to the popup page.

In datalist.php, I modified the code so that if there is a value in the yearsChecked argument, a newly written block of code is executed which ensures that only those subdistrict records which belong to the selected years appear in the popup window.

It is possible as it stands for the user to select e.g. 1901 checkbox, then pick a subdistrict from 1901, then deselect the 1901 checkbox and check the e.g. 1891 checkbox, at which point the configuration which may result in a query that returns no values (namely all records from 1891 that have a subdistrict name specified in 1901). The only exception to this is if there happens to be an 1891 subdistrict name which is identical to the 1901 subdistrict name. In that case, you'd get the records from 1891 that have the subdistrict name from 1891 that happens to be the same as the selected 1901 subdistrict name, but that's probably expected behaviour.

June 16, 2014

display year for each district in datalist viewer

Posted by on 16 Jun 2014 in Activity log

Many of the districts/subdistricts appear in only certain censuses. (e.g. no Alberni in 1881, then one in 1891, 1901, 1911). Turns out each location is a record in the locations table that knows nothing about other location records (i.e. Alberni1891 has no relationship with Alberni1901).
In the search gui, there is a popup dialog that used to list all the distinct values of the subdistrict text (e.g. "Alberni"). Problem with that is that the user doesn't know whether a given district ("Alberni") exists in a given year (1881 vs 1891). I tried coming up with a smart way of manipulating the year checkboxes based on what was happening with the subdistrict values, but that was adding more UI problems then it was solving. So, I added the year to the subdistrict text for each value in the drop down (e.g. Alberni 1891, Alberni 1901) so at least the user can see which years that district exists in.
Next problem is that I can't use e.g. "Alberni 1891" as the value to search for in the SQL query, because of course that string doesn't appear in the subdistrict field of the locations table. So, I had to strip the year off of the value in the text box the user sees. The user must still check the checkbox for the year(s) they want to include in the search. Not ideal, but the best compromise I could come up with.
I modified the default settings to that all years are checked by default. So if the user selected "Alberni 1891" and hit submit, they'd actually get values for Alberni for 1891, 1901 and 1911. The hits show the date, so it would be pretty obvious they're getting results they don't want, and they'd then have to go back to the search form and uncheck the years they don't want. That seems better than the previous behaviour, in which the first year checkbox only was checked by default, so in our example, the user would get back 0 hits (as the first census year there was no Alberni district).

added Alberni data to census 1911

Posted by on 16 Jun 2014 in Activity log

Finished processing the files from PD, then had to debug conflicting HISCO code numbers and a couple of other anomalies in the data. That data now part of the census_1911 table.

June 17, 2013

export database to spreadsheet failed

Posted by on 17 Jun 2013 in Activity log

PD and ES of History wanted to get a dump of the 1901 Census. They tried using the Export feature on the site, but all they got was a white, empty window when they executed it. Looks like the number of records they were trying to export (about 46,000) exceed some RAM or processing time limit in php, so the job was never returned, resulting in the white window. I did a query to export all the males, then all the females. Those two exports worked. I then combined the two into one big spreadsheet file.

I then went into phppgadmin and discovered there were 5 records unaccounted for. Those had a value of 9 in the sex field. I then went back to the website GUI and searched for each of those five records and then incorporated them into the spreadsheet.

So, I did get the client what they asked for by a workaround, but didn't precisely identify the cause of the problem, let alone a solution.

January 31, 2013

minor updates to buildings stuff

Posted by on 31 Jan 2013 in Activity log

Made final tweaks to content and organization of building stuff. Property section now has a building construction documents page with intro/background blurb and link to the search building construction documents data set. Made number of content changes too.

January 14, 2013

further revision to building construction documents

Posted by on 14 Jan 2013 in Activity log

At PD's request, I rejigged the properties chunk of the site. renamed the folder from ta to properties, took the one existing page (taxassessment.php) and split it into properties.php (landing page with links to tax assessment and building documents), taxassessment.php (intro and linke to 3 tax assessment tables), buildings.php (blurb on background and link to search page).
Also made minor tweaks to index.php page in root and the streetname_changes.php file.

rename folder caused svn problems

Posted by on 14 Jan 2013 in Activity log

Seems like renaming a folder using svn rename is a bad idea. Probably better to create a new folder manually copy the files over then manually add all the new stuff to the svn repository. commit all that and assuming it is working, then delete the unwanted folder from the repo. I.e. manually do the copy and delete that rename supposedly does.

I wanted to rename a folder from "taxes" to "properties". The folder contained one file. I did this:
1)
svn rename taxes properties
no problem. Then I committed the repository and got this:
svn: Commit failed (details follow):
svn: Item '/site_root/content/taxes' is out of date

2) So I tried updating the site and got back a taxes folder.
Then I cd'd to the containing folder and tried this:
svn delete taxes
worked. Tried to commit and got:
svn: Commit failed (details follow):
svn: Aborting commit: '/Users/sarneil/Documents/Projects/history/ViHistory/instances_of_site/svnrepo/trunk/site_root/content/ta' remains in conflict
then tried:
svn delete --force taxes
worked. Tried to commit and got:
svn: Commit failed (details follow):
svn: Aborting commit: '/Users/sarneil/Documents/Projects/history/ViHistory/instances_of_site/svnrepo/trunk/site_root/content/ta' remains in conflict

3) did this:
svn status
and got
D C taxes
> local delete, incoming edit upon update

4) Did a search on stackoverflow, and based on that, then tried
touch taxes
svn revert taxes
worked. I decided to try getting rid of the file inside first:
svn rm ta/taxassessment.php
worked. Then committed the change and now have things in sync with an empty taxes folder.

5) Then tried to delete the pesky folder
svn delete taxes
worked. Tried to commit that and got
svn: Commit failed (details follow):
svn: Item '/vihistory/trunk/site_root/content/ta' is out of date

6) Last resort on stackoverflow was "In a nutshell, the trick is to go to the .svn directory (in the directory that contains the offending file), and delete the "all-wcprops" file."
Did that, followed by
svn update
svn commit
and the taxes folder is now gone in the repository and in my copy.

No idea which of those steps (other than 6) are required.

January 11, 2013

building construction documents table added to production site

Posted by on 11 Jan 2013 in Activity log

I added the building_documents table to the production db, and added the necessary files in the production front end for the building construction data set.
I updated the tax assessment and documents to include the material on changed street names.

Next will be removing the building_permit stuff from the dev db and dev front end, then from the prod db and prod front end.

January 9, 2013

building construction documents table added to dev site

Posted by on 09 Jan 2013 in Activity log

I've got a working instance of the building construction documents data set in the dev site. Searchable by name (see previous post), street and year. As some of the dates are yyyy and others are yyyy-mm-dd, I turned the field into a character field and added some code so that in the where clause only the year part of the date gets included.

Assuming PD approves, I'll create an identical instance in the production db and front-end and then clear out the building permits table and front-end stuff from both dev and prod instances, and that has been superceded by the building construction document stuff.

oddity with where clause in search string

Posted by on 09 Jan 2013 in Activity log

I copied the code for the building permit pages to create the building construction document pages (i.e. inc/bcdformbasic.inc, inc/bcdresults.inc and search/searchbcd.php). When I did a search and put Holmes into the owner field in searchbcd.php, I got zero hits, even though I know there are five records. When I do a similar search in searchbp.php, I get the five hits.

Here's the line of code in search/searchbp.php:
$where .= make_where_pattern($surname, 'bp_owner_surname', $where, $form, $fuzzy_func);

Here's the line of code in search/searchbcd.php that returns 0 hits:
$where .= make_where_pattern($holder_name, 'bcd_holder', $where, $form, $fuzzy_func);

Here's the line of code in search/searchbcd.php that returns 5 hits:
$where .= make_where_pattern(('%'.$holder_name.'%'), 'bcd_holder', $where, $form, $fuzzy_func);

I think I've figured it out, based on testing and reading the docs on the postgres similar operator. In the building permits case, the surname is just the surname; in the building construction documents case, the holder is a surname and given names, and that extra information causes the similar function to return false (the docs for pg similar say that the regexp match is across the entire string, which confirms my theory).

The name search in the bcd is now more liberal than in any of the other tables. the bcd finds the searchstring anywhere in the name field, the others have to match much more closely (no sure exactly how closely) but have separate surname and forename fields and only the surname field is searched.

January 8, 2013

normalizing data for building documents

Posted by on 08 Jan 2013 in Activity log

Did the following so far to normalize the data:
1 delete first column (sequential id number)
2 render rooms as integer
3 replace left double guillemot with 1/2
4 render cost value as integer
5 clear trailing whitespace at end of file
6 provide correct field names in first line
bd_permit_ref,bd_date,bd_notes,bd_holder,bd_street,bd_street_num,bd_sublot,bd_lot,bd_block,bd_section,bd_add_alt_repair,bd_purpose,bd_rooms,bd_cost,bd_arch_builder

In records in which the room number value was something like 2 buildings 4 rooms each, I rendered that as 8 rooms.

Had to stop when I noticed 3000 records have only a year in the date field (8000 have year-month-day). PostGres requires the yyyy-mm-dd for a date data type. I recommended to PD that we change that field to a string and accept less sophisticated comparison operations. Now waiting to see if thinks that's ok, or if he wants to explore more involved solution (namely, split field into 3 separate fields and then add a bunch of complicated code as needed to allow for filtering by dates.

January 7, 2013

starting on adding building documents table

Posted by on 07 Jan 2013 in Activity log

Updated my dev instance of the front end (in my account) so it's the same as the production instance (minus the huge cache of images for the maps) in the taprhist account.

Still no luck setting up a time with the sysadmins to clear out the huge table logging search queries (6 million records), and to migrate dbs. Will proceed with the dbs I've got.

- production instance is pointing to the vihistory db ; dev instance is pointing to the vihdev db
- added a building_documents table with appropriate fields to the vihdev instance
- added a building_documents_bd_id_seq sequence to the vihdev instance

December 18, 2012

proceeding with building documentation table

Posted by on 18 Dec 2012 in Activity log

Got answers to questions and data spreadsheet from PD.
The production front end is pointing at the vihdev db. I tried to back that up using the psql command provided by Jaime and I got a file of zero bytes. I then tried to use the export feature in phypgmyadmin (or whatever it's called) and that brought the db service down. Here's the error message, which suggests that the device is full, but we now think is actually a permissions problem:
==================================
SQL error:

PANIC: could not write to file "pg_xlog/xlogtemp.24088": No space left
on device
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
==================================

Sys-admins cleaned things up and restarted the service within a few hours. Thanks.

I tried connecting to the vihistory db running in the same environment, but the connection failed.

So, until I get a successful backup of the data in the server, I'm very hesitant to do any work on it. Here's what I've proposed to the sys-admins:

1) export and clear big log table from db
- write table search_log in db vihdev to file
- clear contents from table search_log in db vihdev
NB When I recently tried it from my Mac, I got a search_log.err file of
size 0 bytes. Since the last time we successfully executed the command
below the db engine has moved to mango.
psql -W -U viAdmin -h pgsql.hcmc.uvic.ca -A vihdev --command "copy
search_log to stdout;" > search_log.err
NB table has 6.5 million records, each about 1K = 6 gigs

2) create a new db to use as the production instance
- clone table vihdev, name clone vihprod
- drop db vihistory
- modify acl on vihprod based on settings for vihdev
NB I have to change connection strings on site

3) set up long-term plan for write-then-clear search_log table
NB cron job vs manual
NB frequency

4) ensure I can migrate future changes in vihdev db to vihprod db

June 26, 2012

hist : cha : adding banner image

Posted by on 26 Jun 2012 in Activity log

PB sent me a large jpg to use as a banner on the conference site. Took a couple of guesses to get something the correct dimensions. Also modified the bg of the image from white to the taupe colour used on the site - they were OK with that.

Turned out that
- headerTitle div had a number of peculiar styling rules
- #header h1 had odd padding settings which resulted in the banner being noticeably off-center
I overruled by adding a selector in the cha2013styles.css file. Noticed that a lot of elements have a lot of instances of selectors and conflicting style rules applied to them - sometimes 2 or 3 selectors for the same id in the same css file.

Finally, it seems that the CMS doesn't like comments in css files, so I can't include any.

Here's the contents of the modified css file to date (with comments added here by me):

div#navbar ul.menu li:nth-child(1),
div#navbar ul.menu li:nth-child(2),
div#navbar ul.menu li:nth-child(4) {
display:none;
}

div#navbar ul.menu li a[href$=account] {
display:none;
}
div#navbar ul.menu li a[href$=search] {
display:none;
}
div#navbar ul.menu li a[href$=announcement] {
display:none;
}

/* make header have thinner border on bottom and mirror it on top */
#header {
border-bottom: 1px solid #898E79;
border-top: 1px solid #898E79;
}

/* get rid of odd padding around h1 in header to ensure contents are centered */
#header h1 {
padding: 0 0 0 0;
}

/* headerTitle has a barrage of selectors and style rules, over-rule them all
most importantly over-rule the fixed height so the banner image can be contained elegantly */
#headerTitle {
height: 100%;
padding-bottom: 0px;
padding-top: 0px;
margin-bottom: 0px;
margin-left:0;
}

1891 check census draft added to dev site

Posted by on 26 Jun 2012 in Activity log

At request of PD, I processed the the raw checkcensus csv files he submitted into something suitable for upload to database. Oddities in data that had to be resolved: street numbers such as 7-9 or 12 and a half, number of occupants such as 20+, 20plus, 20 to 30 etc., quotation marks and commas in the dataset being confused for quotation marks and commas as delimiters.

Also set up the table, created the fields including an entry in the sequences table, all based on similar work done for the 1871 census, which was also a kind of one-off data set.

Did some twiddling with the presentation of results (particularly with how to represent the various note fields). For those records that have an id for the 1891 Dominion census, wrote code to generate a link to the full record for that individual in the Dominion Census, but have not modified the 1891 Dominion census in any way (e.g. to include a link back to an entry in the 1891 Check census).

June 1, 2012

issues with updating records to census_1911 table

Posted by on 01 Jun 2012 in Activity log

Looked into what's involved in updating 1400+ records in the census_1911 table.
Have spreadsheet from PD. About 37 of the fields in the db can be just read in from the spreadsheet. About 12 of the fields in the db have to be calculated based on values in the spreadsheet (the nature of the calculation varies for each of the 12 instances. The notes field in the PG table describe the needed calculations. Not sure how Jamie did this last year, but it looks like I'll have to write a bunch of code to do that processing and generate the csv needed.

Then I'll need to tread carefully regarding the actual upload. I think I'll need to delete the existing records and then upload (using the copy command, or possibly the import feature in the admin client).

create street renaming and address renumbering pages

Posted by on 01 Jun 2012 in Activity log

PD asked me to create one page to display a table of all the streets (or portions of streets) that have been renamed over the time range of the VIHistory datasets. I did that by running the tab-delimited through a few regexs to produce a styled html table and put that into a new page.
PD also asked for a similar page to display a table of all the addresses that were renumbered in 1907. That took a bit more regex processing, as I wanted to created a heading for each street and then a table for all the addresses on that street.

minor changes to appearance of links page

Posted by on 01 Jun 2012 in Activity log

Made some text corrections and updated some URLs on the links and maps landing pages. Also added one occupation to the occupations table (but not the occupations_cnv table) in both the vihistory and vihdev instances (though the latter is I think no longer in use).

April 27, 2012

only basic search for building permits

Posted by on 27 Apr 2012 in Activity log

Modified global variables so only the basic search is available to user. Also added a help file specifically for the business permits, changed global variable that writes link to it and edited the content of the help file.

April 18, 2012

create building permits search interface

Posted by on 18 Apr 2012 in Activity log

Made copies of a number of ta (tax assessment) files, renamed them and edited them to work with the building_permits table in the db. Did this in the dev instance in my account on our server.
Files I've added:
site_root/inc/bpformbasic.inc
site_root/inc/bpresults.inc
site_root/search/searchbp.php

The link to the searchbp.php page is on the tax assessment page (ta/taxassessment.php).

Emailed JL and PD for guidance on which fields to include in search interface(s) and which results fields are primary and which secondary.

More headaches with svn than the actual code (as usual). I had in the repo and on my local drive a folder (bp) containing 1 file. I did an svn delete and the file deleted on the local instance, but not the folder. When I then did a commit, I got a file out of date error. When I tried various ways of sorting out this problem I ended up consistently geeting "/path/on/local/drive/' remains in conflict" errors.
Googled it and discovered I'm not alone. Quite a number of people renamaing, moving or deleting files that have everything going smoothly except for one file or folder that some gets into "conflict"
Solution:
svn resolved path/to/conflicted folder
svn update path/to/conflicted folder
svn commit -m "resolving conflicted folder or whatever"

April 13, 2012

add building permits table and populate it

Posted by on 13 Apr 2012 in Activity log

Created a building permits table in the vihdev db. Noticed that to auto-increment the building_permit_id field, you have to reference a sequence, so created the necessary sequence modelled on others I found in the db (census tables).

Processed the raw data file (spreadsheet) into normalized data (typed a couple of the data fields that I could e.g. int or date and normalized data to comply with the constraints I had established e.g. length of varchar fields). Saved that as a CSV (rather than tab-delimited) as the documentation seemed to favour the CSV approach.

Only substantial fiddling I had to do with the data was for all the records whose date field was only a year (e.g. 1889), I arbitrarily assigned them the 1st of January (e.g. 18890101) as the date field requires 8 digits.

Once that instance uploaded successfully, did the exact same thing in the production instance, just so I have a second copy of the thing somewhere.

Once that was all working in the dev instance of the db,

April 10, 2012

revised branches with updated imap stuff

Posted by on 10 Apr 2012 in Activity log

Replaced the contents of the trunk, Alex branch and the backup branch with the files as updated by Martin and Greg, so they're up to date.
Gave AD all the svn, web account and db connection info he should need to get to the files, check them out, and post them to the web space for testing.

March 29, 2012

add new tag or branch to svn repo

Posted by on 29 Mar 2012 in Activity log

note to self on nuts and bolts
on local file system:
create the folder structure you want (if you're copying an existing local instance of an svn project, you have to delete the .svn file from each folder in that project)
on command line,
cd to parent folder of the one you want to add (that parent folder has to already be in svn)
svn add FOLDER_YOU_WANT_TO_ADD
svn commit -m "message about adding new folder"

editing config files in new svn branches

Posted by on 29 Mar 2012 in Activity log

There are three files in the site which contain database connection strings:
inc/config_EDIT_ME.inc
content/maps/include/conf_EDIT_ME.inc
content/maps/include/config_EDIT_ME.xml

In each of these three files, the values for the database connection string have been replaced with placeholders. You have to make a copy of each of those files with the following names:
inc/config.inc
content/maps/include/conf.inc
content/maps/include/config.xml
In the copies, substitute the correct values for your connection string.

If the folder is in svn (which it probably is), you'll need to use svn add to add each of the files to the repo, then do your svn commit.

March 19, 2012

Streetcar map fixed; other issues o/s

Posted by on 19 Mar 2012 in Activity log

I've now fixed the streetcar map problem, by changing all filenames and references so that they're consistently referring to 1939 instead of 1936. The fixes have been committed to trunk, and put up on the website.

The other two issues remain outstanding; PD will get back to me with the correct firemap URL at Malaspina, and we'll wait until the DNS has been changed before addressing the problem with Firefox and captcha, since it seems to be cookie-related.

PD has checked the site: a couple of remaining issue

Posted by on 19 Mar 2012 in Activity log

PD has checked the new VIH site on taprlans/www, and reports only these issues:

Captcha reports errors for him on FF. I can't reproduce this -- it works fine for me -- so waiting for more details.

There's a link on the maps.php page pointing to the "1936 streetcar map" which fails, defaulting to the panorama, because it should point to 1939. The confusion arises thus:

The map itself has the date "1939" on it.

The .map file (the basic definition file) is called
"vicstreetcar1939.map".

However. it contains pointers to images called:

victoria-streetcar-1936.png
victoria-streetcar-1936-key.png

and this page:
http://hcmc.uvic.ca/~taprhist/content/maps/maps.php

has 1936 in its caption. However. if you go to the map viewer,
click on the Maps menu and drill down to it. you see the caption "1939
- Streetcar routes".

I think that:
	- The actual date is 1939.
	- The images are wrongly named, as are the pointers to them in
the .map file.
	- The caption on the maps.php page is wrong. whereas the Maps 
menu in the map viewer is right.

Waiting for PD to confirm my analysis before changing the image file names, the .map file, and the caption/link on maps.php.

PD reports this:

The link on the page describing the 1885 Fire Insurance Plans of Victoria 
needs to be changed. The page in question is located at:

http://hcmc.uvic.ca/~taprhist/content/maps/firemap.php

The hyperlink should be redirected to:

http://www.mediastudies.viu.ca/steeple/index.htm

[The link currently points to an obsolete server - 
http://cdi.mala.bc.ca/firemaps/

However, the new URL is about 1891 panorama images; it has nothing about the 1985 Fire Insurance Plans. Waiting for the correct URL from PD.

March 15, 2012

Removed a browser restriction on the VicFire 1891 map

Posted by on 15 Mar 2012 in Activity log

There's one special map that uses SVG for its interface, and which was originally working only on FF; DB had put a complete block on other browsers, but now most of them support SVG so I've removed that block. Browsers that don't support SVG should get with the program. Committed that change to SVN, but I was unable to commit the bulk of my additions (code which was not originally in SVN, but should have been) because we ran out of disk space on revision.tapor.uvic.ca. We'll be switching to the new SVN on Monday, so hopefully this problem will be solved.

March 14, 2012

Got the maps working

Posted by on 14 Mar 2012 in Activity log

Got the site basically working by doing this:

Moved everything from taprhist/vihdev/www to taprhist/www (it really doesn't like living in that odd location, and there are hard-coded paths in several text files).
Removed "vihdev" from paths in config files.
Cleaned out the /home1t/taprhist/www/content/maps/cache folder so it had to start rebuilding.
Now we found that most maps were working, but a handful were failing. The failing maps had one line in their .map file:
```
 METADATA
      "queryable" "true"
      "tile_source" "cache"     <-- This line has to be removed.
 END
```
Our surmise is that this problem line causes the server to construct a broken path to a cache folder that doesn't exist or isn't writable, and therefore it cannot read or construct tiles.

We now propose to have the DNS repointed so that vihistory.uvic.ca points at taprhist/www, and keep the live site there.

Greg also noticed that vihistory.ca is broken; outside of the ring it's pointing at mala.bc.ca DNS servers, so he's emailed PD to get him to fix that on the domain host.

March 6, 2012

missing maps resist debugging

Posted by on 06 Mar 2012 in Activity log

I've been working for the better part of the last couple of days trying to figure out why the imap maps in ViHistory are not appearing.
Problem appears with the production front end and a test front end connected to either the old db server or the new db server.
Problem appears with a test front end which is the production front end minus the captcha code (which is the only code that has changed since Jamie left us a working site).
We weren't getting errors when we trolled the server logs on lettuce.
We did get the following errors from the sysadmin:
Apache error log:

[Tue Mar 06 09:16:21 2012] [error] [client 96.54.151.99] [Tue Mar 6
09:16:21 2012].616424 loadSymbolSet(): Unable to access file.
(/home1t/taprhist/www/content/maps/user/symbol/generic.sym), referer:
http://vihistory.ca/content/maps/htdocs/index.php?map=vicbird1889

AND syslog:

2012-03-06T09:16:21-08:00 local@mustard.hcmc.uvic.ca user.notice
php-cgi: PHP Warning: [MapServer Error]: loadSymbolSet(): (/home1t/taprhist/www/content/maps/user/symbol/generic.sym)
2012-03-06T09:16:21-08:00 local@mustard.hcmc.uvic.ca user.notice in: /home1t/taprhist/vihdev/www/content/maps/htdocs/init.php on line 125
2012-03-06T09:16:21-08:00 local@mustard.hcmc.uvic.ca user.notice
php-cgi: PHP Warning: Failed to open map file /home1t/taprhist/vihdev/www/content/maps/user/map/vi1798.map in /home1t/taprhist/vihdev/www/content/maps/htdocs/init.php on line 125
2012-03-06T09:16:21-08:00 local@mustard.hcmc.uvic.ca user.notice
php-cgi: PHP Fatal error: Call to a member function getMetaData() on a non-object in /home1t/taprhist/vihdev/www/content/maps/htdocs/init.php on line 131
2012-03-06T09:16:21-08:00 local@mustard.hcmc.uvic.ca local0.debug
suphp_wrapper: 0 PHP5
/home1t/taprhist/vihdev/www/content/maps/htdocs/init.php

That init file is as provided by the imap people, so I really, really doubt it is causing the problem, though it is the where a problem occurs. It looks like that file is trying to create objects based on what it reads from some kind of config file, and somehow that process is breaking down, so the object doesn't get created, so invoking a method on the (non-existant or empty) object throws the error. MOre precisely, it looks like the config.php file is supposed to create an array in the variable aszMapFiles and then those values are used in the init file, but for some reason something is failing with the way that array and associated variables are being populated.

March 2, 2012

post-gres based maps not showing up

Posted by on 02 Mar 2012 in Activity log

User reported that none of the post-gres based maps (insurance etc) for ViHistory are appearing. The imap infrastructure appears in the browser window, but no image tiles. PD forwarded this and I confirmed it. Spent a number of hours with Greg and Drew trying to solve the problem. Still not resolved.

February 29, 2012

db on new db server working

Posted by on 29 Feb 2012 in Activity log

Drew migrated the viHistory and vihdev postgres db's to the new db server (pgsql.hcmc.uvic.ca)
I created an instance of the viHistory site at http://lettuce.tapor.uvic.ca/~sarneil/vihtest/site_root and changed the connection string to point at the postgres service on the new db server (pgsql.hcmc.uvic.ca) rather than the old db server (db.tapor.uvic.ca). I didn't change anything else. After a couple of iterations through errors coming back from the db, Drew made the necessary adjustments and the connection is now working to the new db.

What I've got to do now is take the vihdev db on the db.tapor.uvic.ca server and put it into the viHistory db on the pgsql.hcmc.uvic.ca server. Then when we change the connection string for the production site, we change the name of the db it's pointing to as well as the server it's pointing to.

February 21, 2012

create tags and branch for Alex

Posted by on 21 Feb 2012 in Activity log

The trunk I started with has the captcha modifications in it.
I created a tag called beforeAlex which is a copy of the trunk corresponding to the publicly accessible site.
I then svn rename the config.inc file to config_Edit_Me.inc and abstracted the connection string values and committed that. I searched all the files for any other instance of the connection string values and did not find any.
I then created a tag called configAbstracted which is a copy of the trunk with the config file modified (that's the only difference between the configAbstracted tag and the beforeAlex tag).
I then created a branch called alex which is identical to the configAbstracted tag (at least until Alex starts modifying it).

Still to do:
- create an svn user for Alex with rw privileges on https://revision.tapor.uvic.ca/svn/hcmc/vihistory/branches/alex
- create a user in the database for Alex with read privileges, which he can use to modify the config_Edit_Me.php file.

Had a puzzling time with this: file originally called censusformbasicCaptcha.inc was renamed to censusformbasic.inc and file originally called censusformbasic.inc was renamed to censusformbasicNoCaptcha.inc, but I forgot to do svn rename for those. Subsequently had a miserable time trying to commit to the repository as I kept getting File "censusformbasicCaptcha.inc is out of date, path not found", but only for that file. I eventually svn delete 'd the entire svnrepo folder on my local computer and checked out the entire repository and the errors went away.

February 14, 2012

how to implement captcha in vihistory searches

Posted by on 14 Feb 2012 in Activity log

To make captcha work, you need to make modifications to the included form file and to the file that invokes the form.
// I'm using the $_SESSION['captchaPassed'] variable, so that even if the user moves from form to form or query to query, they only have to verify one captcha per session
// if $_SESSION['captchaPassed'] is not set, the user has not yet tried to validate a captcha field in this session
// if $_SESSION['captchaPassed'] is set and has value false, the user has tried to validate a captcha field in this session and failed
// if $_SESSION['captchaPassed'] is set and has value true, the user has successfully validated a captcha field in this session

I'll need to rewrite the code in the annotations section to call on the instance of captcha in the captcha folder rather than the one in the annotate folder.

Because the js file currently assumes the call is coming from a file at a certain level in the site hierarchy, and the search and annotate files are at different levels in that hierarchy, I'll have to either write a separate function for the annotate call, or modify the calls to the loadcaptcha function so they pass in the path from themselves to the root as a second argument.

==================================================
Changes to the actual search form (e.g. censusformbasic) as seen in censusformbasicCaptcha.php
==================================================

1) Make sure this line points to correct file (for production, should be searchcensus.php) :

$search_arg = "searchcensusCaptcha.php?start=0&form=$form&show=n";

2) Add this block at end of form:

<?php
if ((isset($_SESSION['captchaPassed'])) && ($_SESSION['captchaPassed'])) {
//do nothing
} else {
?>
<fieldset class="">
<legend>
Verify
</legend>
<label>
To verify that you are a real, live human and not a spambot,<br />
enter the text you see in the image below.<br />
If you can't read the text, click
<a href="javascript:loadcaptcha('captcha');"><b>here</b></a>,
or see the <a href="<?php print $doc_root; ?>/help/annotation.php#verify">help page</a>.
<?php
if ((isset($_SESSION['captchaPassed'])) && (!($_SESSION['captchaPassed']))) {
print '<br /><br /><span style="color: #ff0000; font-weight: bold;">The text you entered does not match the image. Try again. </span>';
}
?>
</label>
<br />
<label for="verifier">Image text (5 letters)</label>
<input type="text" name="verifier" id="verifier" value="" />
<img id="captcha" src="<?php print $doc_root; ?>/captcha/captcha.php" width="200" height="60" alt="Verify" style="border: 1px solid #000000;;"/>
</fieldset>
<?php
} // end of if isset($_SESSION['captchaPassed']) ...
?>

==================================================
Changes to the page calling the form (e.g. searchcensus.php) as seen in searchcensusCaptcha.php.
==================================================

1) Add these blocks to head:

/**
* Load captcha library.
*/
require_once("$doc_root/captcha/php-captcha.inc");

2) Make sure this line points to right file (for production, should be searchcensus.php):

$search_page = "searchcensusCaptcha.php";

3) Add this block immediately after the first "if (isset($_GET['start'])) {"

//SA we need to defy the spambots
// I'm using the $_SESSION['captchaPassed'] variable
// if $_SESSION['captchaPassed'] is not set, the user has not yet tried to validate a captcha field in this session
// if $_SESSION['captchaPassed'] is set and has value false, the user has tried to validate a captcha field in this session and failed
// if $_SESSION['captchaPassed'] is set and has value true, the user has successfully validated a captcha field in this session
// need to do the CAPTCHA test before any query gets sent to the DB
// Only do the test if the user has not tried to validate yet, or if the user has tried to validate and failed
// my actual is test is : if (NOT ((session var is set) AND (session var is true)) )
if (!((isset($_SESSION['captchaPassed'])) && ($_SESSION['captchaPassed'] == true))) {
if (PhpCaptcha::Validate($_POST['verifier'])) {
$_SESSION['captchaPassed'] = true;
} else {
$_SESSION['captchaPassed'] = false;
}
}
if ((isset($_SESSION['captchaPassed'])) && ($_SESSION['captchaPassed'] == true)) {
// We know user has successfully validated the captcha, so do the query and display the results

4) add this block immediately before "} // end of if isset($_GET['start'])"

} // end of if isset ($_SESSION['captchaPassed']) ...

5) make sure this line points to correct file (that is, censusformbasic):

require("$doc_root/inc/censusformbasicCaptcha.inc"); //SA was censusformbasic.inc

6) add this block immediately before the second "if (isset($_GET['start'])) {"

if ((isset($_SESSION['captchaPassed'])) && ($_SESSION['captchaPassed'] == true)) {

7) add this block after "pg_free_result($result); }}"

} // end of if isset($_SESSION['captchaPassed']) ...

February 10, 2012

set up svn, reconcile various copies of vihistory site

Posted by on 10 Feb 2012 in Activity log

Checked out the vihistory svn repository at https://revision.tapor.uvic.ca/svn/hcmc/vihistory into a directory on my mac
Copied the files down from the production site (taprhist/vihdev). Next I need to compare the two to make sure that the repository and production site are in sync.

Once that's done, then I can start work on implementing the captcha code from the annotations page into other forms, name the searchcensus one.

Also, the production site has .svn folders in it, which might be the result of using an svn client on that set of folders, or may just be an artefact of ftping files up to the server from another location that was in svn.

February 8, 2012

looking into Captcha

Posted by on 08 Feb 2012 in Activity log

Looking into captcha for VIHistory. One simple-looking candidate is Securimage, so I might try that. It's pure php (no js). I have downloaded it and created a simple test file which works fine in the VIHistory account. Hooking that into DB's code for form handling will not be too difficult hopefully. Martin and Greg reported that there is an existing captcha thingee on the annotations form (which I confirmed) but that it was JS dependent, so could be got around by simply turning javascript off (which I did not confirm). I'll have to get more details from them. Right now I don't see what securimage will get us that the current captcha thingee doesn't, as the existing thingee seems to work just fine with javascript turned off for me.

set up new student

Posted by on 08 Feb 2012 in Activity log

CC has new student ES working FrancoToile. First task is copy editing.
Got her to log in, then I created an SVN collection on Pomme.
She edited one file, then committed the collection, which worked.
She then tried updating, but of course there were no changes to any of the files, so the client reported that the directory had been skipped - not sure if that is the correct behaviour or not. I'll have to change one of the files as me before she comes in tomorrow and see what happens when she then logs in and updates.

Still have not looked at the problem with the rng file not allowing attributes and elements that it should be, but the schema is being invoked.

December 20, 2011

delete bogus annotations

Posted by on 20 Dec 2011 in Activity log

PD reported a number of bogus annotations had been submitted on three specific dates. I went in through phppgmyadmin to see the annotations table and made sure there were no dependencies that would be broken if I deleted records from that table (there weren't as each record in that table has a field for data table and data record id to link the data record and annotation). I then exported that table and extracted the records from the specified days and sent that to PD as a spreadsheet. He reviewed the records and told me which ones to delete, which I did about 30 in all.

I also confirmed that the active database is vihdev. The vihistory database is now out of date (though I found some similar bogus entries in the annotations table in that db and deleted them. At some point we may want to drop all the tables in the vihistory db, copy all the tables from vihdev to vihistory and then change the connection configuration in the front-end files. Not a high priority.

Finally, noticed 3.8 million records in the search_log table since 3 of July (or possibly 7 of March). That's consistent with Jaime's earlier blog posting finding 20 million records over 5 years in that table and suggestion for how to export them to a file. I'll wait before exporting the search log again, as I'll need some help from the sys admins (or may end up asking them to do it as a timed job once every six months or so).

August 17, 2011

More page copy updates

Posted by on 17 Aug 2011 in Activity log

Updated more content on the development server, supplied by PD:

http://tapor.uvic.ca/~vihdev/

http://tapor.uvic.ca/~vihdev/content/census/1911/census1911.php?page=main

http://tapor.uvic.ca/~vihdev/content/census/1911/census1911.php?page=enumcats

August 11, 2011

Added 1911 page copy; updated homepage copy

Posted by on 11 Aug 2011 in Activity log

Added some introductory text for the 1911 census, supplied by PD. He also gave me a new version of the homepage text, which I updated.

http://tapor.uvic.ca/~vihdev/index.php

http://tapor.uvic.ca/~vihdev/content/census/1911/census1911.php?page=main

August 8, 2011

July 26 meeting - changes done

Posted by on 08 Aug 2011 in Activity log

Addressed all of the issues from my July 26 meeting with PD. Specifically:

I updated the meta tags with new information (1911 and 1901 census years), and fixed the "Courtenay" typo
I removed the searchable "dot" for 1901 for the "Birthplace of Father" field in the advanced census search form
I fixed the short list of incorrect records that you gave me (stemming from transcription errors)

We had also discussed adding Racial/Ethnic Origin as a searchable field for 1881, but upon further investigation I've discovered that, due to a bit of confusion on my part (there are a few similar fields), the 1881 census doesn't actually have this data. Racial/ethnic origin is actually "race_id", which doesn't appear until 1901. The closest the 1881 data comes is "origin_id", which actually looks to be the birthplace of the father (that's how it's setup in the search). So, we are going to stand pat with this change.

August 3, 2011

Annotations targeted by spammers

Posted by on 03 Aug 2011 in Activity log

Deleted about 40 spam annotations from the live website this morning, brought to my attention by PD. If the problem persists, we'll have to seek out a better method for spam prevention, since the captcha isn't doing its job very well.

July 27, 2011

occupation_id no longer used in census tables

Posted by on 27 Jul 2011 in Activity log

Due to a primary key sequence inconsistency on the occupation table between my local machine, where I first imported and tested all of the data, and the development server, many of the new occupation codes inserted for the 1911 data had a different occupation_id on each machine. As a result, the occupations in the 1911 view for census records made reference to incorrect occupation IDs and, thus, incorrect occupation codes.

Since occupation_id isn't actually needed - occupation_code is also a unique identifier and doesn't suffer from the same sequence restraints - I decided to add an occupation_code column to all of the census tables and update the views to use this column instead of occupation_id.

The steps I took:

#1: Added occupation_code character varying (5) column to 1881, 1891, 1901, 1911 on both local machine and dev server

#2: For all except 1911 executed this SQL on test server:


UPDATE census_1881 SET occupation_code = (SELECT occupation_code FROM occupation WHERE occupation.occupation_id = census_1881.occupation_id)

UPDATE census_1891 SET occupation_code = (SELECT occupation_code FROM occupation WHERE occupation.occupation_id = census_1891.occupation_id)

UPDATE census_1901 SET occupation_code = (SELECT occupation_code FROM occupation WHERE occupation.occupation_id = census_1901.occupation_id)

On my local host, I also executed this on 1911:

UPDATE census_1911 SET occupation_code = (SELECT occupation_code FROM occupation WHERE occupation.occupation_id = census_1911.occupation_id)

#3: Altered the 1881, 1891, 1901, 1911 view definitions to change this (1881 used as example):


LEFT JOIN occupation ON occupation.occupation_id = census_1881.occupation_id

To this:


LEFT JOIN occupation ON occupation.occupation_code = census_1881.occupation_code

Also changed the occupation_id column definition in the views from

census_<year>.occupation_id

occupation.occupation_id

. This effectively rendered the occupation_id column in the census tables obsolete, since occupation_id is no longer used as the foreign key lookup in the views.

#4: Modified vOccupations<year> views to join the census table on occupation_code rather than occupation_id

#5: Dropped now unneeded occupation_id from census_1911, but not the earlier data. It's not strictly needed in the other tables but I don't want to rock the boat too much.

#6: re-imported all of the 1911 data from my localhost to the development server.

All is good - incorrect occupations have been corrected.

Meeting with PD

Posted by on 27 Jul 2011 in Activity log

Met with PD yesterday (July 26) to go over some loose ends with the new dataset. Tasks for me that arose from the meeting:

PD pointed out some inconsistencies with occupation codes; namely, that some 1911 census records have the wrong code. I need to do a full scan to find out how many there are, and then fix them.
PD also pointed out some incorrect data in 6 of the records. The mistakes were in the original CSV files. I need to go into the database and correct them.
The 1881 data should have Racial/Ethnic origin as a searchable field - it's in the data but not searchable on the website.
The 1901 data should not have a searchable 'dot' for birthplace of father.
The meta tags should be updated with new information to match the latest data imported.

July 25, 2011

SD8 received and imported

Posted by on 25 Jul 2011 in Activity log

Received SD8 (Hotels and rooming houses) from PD over the weekend, along with some new occupation, relation to head, religion, birthplace, and nationality codes. Imported successfully.


INSERT INTO "location" VALUES (103, 'BC', 'British Columbia', '13', 'Victoria City', '8', 'Hotels and Rooming Houses', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

July 15, 2011

SD16 data imported

Posted by on 15 Jul 2011 in Activity log

After sorting out the ID issue, I've imported the SD16 data.


INSERT INTO "location" VALUES (102, 'BC', 'British Columbia', '13', 'Victoria City', '16', 'Beacon Hill', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

sd16 Beacon Hill received from PD; missing IDs

Posted by on 15 Jul 2011 in Activity log

PD sent the SD16 (Beacon Hill) data set yesterday, which is missing unique IDs. As PD says, the ID numbers likely got list during one of the many data shuffles from Access to Excel to various worksheets.

The highest ID in the census data is 31420. To ensure that any IDs used for SD16 aren't assigned to any other data sets (all of which do have unique IDs in the Excel spreadsheets), I wrote a small function in LibreOffice to generate IDs starting with 31421. In the ID column, I simply entered "=ROW()+31419" for each row.

July 4, 2011

SD11 and SD15 received and imported

Posted by on 04 Jul 2011 in Activity log

PD sent SD11 (Jubilee) and SD5 (Burdette-Church Hill) over the weekend, both of which I imported this morning.


INSERT INTO "location" VALUES (100, 'BC', 'British Columbia', '13', 'Victoria City', '15', 'Burdette-Church Hill', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL); 

INSERT INTO "location" VALUES (101, 'BC', 'British Columbia', '13', 'Victoria City', '11', 'Jubilee', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 30, 2011

sd23 Humboldt received and imported

Posted by on 30 Jun 2011 in Activity log


INSERT INTO "location" VALUES (99, 'BC', 'British Columbia', '13', 'Victoria City', '23', 'Humboldt', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 24, 2011

sd17 James Bay North received and imported

Posted by on 24 Jun 2011 in Activity log


INSERT INTO "location" VALUES (98, 'BC', 'British Columbia', '13', 'Victoria City', '17', 'James Bay North', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 22, 2011

SD19 (Ships 1) and SD22 (Ships 2) received and imported

Posted by on 22 Jun 2011 in Activity log

PD sent me the data for SD19 and SD22, both of which are sub-districts covering people living on ships docked at the Victoria harbour. All data imported successfully along with a handful of new occupations.

June 21, 2011

SD12 Rockland-Foul Bay received and imported

Posted by on 21 Jun 2011 in Activity log

Received the data for Rockland-Foul Bay (SD 12) from PD yesterday and imported today, along with a handful of new occupations.


INSERT INTO "location" VALUES (95, 'BC', 'British Columbia', '13', 'Victoria City', '12', 'Rockland-Foul Bay', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 14, 2011

SD9 Fernwood-Fort Street received and imported

Posted by on 14 Jun 2011 in Activity log

Received Fernwood-Fort Street from PD and imported. Large dataset of around 3400 records. Added in some new occupation codes.

The loader app is showing its limitations when handling large files such as this one. It could use a rewrite, since it was originally designed for Access files with only cursory support for CSVs. But with so much data already in it's probably not worth the effort right now.


INSERT INTO "location" VALUES (94, 'BC', 'British Columbia', '13', 'Victoria City', '9', 'Fernwood-Fort Street', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 13, 2011

SD2, SD6, SD21 received and imported

Posted by on 13 Jun 2011 in Activity log

Received three datasets from PD: Gorge-Douglas (SD2), Hillside-Douglas (SD6), and Quadra-Harris Green (SD21). All imported into development database with a bunch of new occupations, races, nationalities, origins, and religions.


INSERT INTO "location" VALUES (91, 'BC', 'British Columbia', '13', 'Victoria City', '2', 'Douglas-Gorge', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL); 

INSERT INTO "location" VALUES (92, 'BC', 'British Columbia', '13', 'Victoria City', '6', 'Hillside-Douglas', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

INSERT INTO "location" VALUES (93, 'BC', 'British Columbia', '13', 'Victoria City', '21', 'Quadra-Harris Green', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 10, 2011

New versions of Central Park (SD 5) and Vic West (SD 1) data imported

Posted by on 10 Jun 2011 in Activity log

Received new versions of the Central Park and Vic West datasets from PD. Imported succesfully to development website.

Hillside-Quadra (SD 7) data received and imported

Posted by on 10 Jun 2011 in Activity log

Received Hillside-Quadra data from PD and imported successfully to development database/website.


INSERT INTO "location" VALUES (90, 'BC', 'British Columbia', '13', 'Victoria City', '7', 'Hillside-Quadra', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

June 9, 2011

James Bay (sd 25), James Bay southeast (sd 24), James Bay southwest (sd 18) data received and imported

Posted by on 09 Jun 2011 in Activity log

Three data sets as noted in title imported into development website successfully.


INSERT INTO "location" VALUES (87, 'BC', 'British Columbia', '13', 'Victoria City', '25', 'James Bay', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL); 

INSERT INTO "location" VALUES (88, 'BC', 'British Columbia', '13', 'Victoria City', '24', 'James Bay Southeast', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL); 

INSERT INTO "location" VALUES (89, 'BC', 'British Columbia', '13', 'Victoria City', '18', 'James Bay Southwest', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

May 5, 2011

1911 Fairfield-Rockland data received and imported

Posted by on 05 May 2011 in Activity log

Received sub-district 14 Fairfield-Rockland 1911 data from PD and imported without incident. New location:


INSERT INTO "location" VALUES (86, 'BC', 'British Columbia', '13', 'Victoria City', '14', 'Fairfield-Rockland', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

Oriental District data received and imported by PD

Posted by on 05 May 2011 in Activity log

Received the data set for census 1911 sub district 20, the Oriental District, from PD yesterday. Imported mostly without incident, except that the large size of the CSV - 3144 rows, almost 3x as many as previous data sets - caused the loader application to time out resulting in my having to run the import multiple times to get all of the rows.

Added new location code

Posted by on 05 May 2011 in Activity log

For the forthcoming "Oriental District" (SD 20), 1911:


INSERT INTO "location" VALUES (85, 'BC', 'British Columbia', '13', 'Victoria City', '20', 'Oriental District', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

April 21, 2011

1911 "Central Park" data received and imported

Posted by on 21 Apr 2011 in Activity log

Received sub-district 5, "Central Park", from PD yesterday and imported this morning. Had to fix a handful of codes and modify the loader map file for slightly-different column names, but other than that the data was brought in smoothly. Added new occupations:


INSERT INTO occupation (occupation_description, occupation_code) VALUES 
('Pool Romm Attendant', '59990'),
('Setter', '81275'),
('Lamp Fitter', '87330'),
('General Finisher', '99999');

And a new location:


INSERT INTO "location" VALUES (84, 'BC', 'British Columbia', '13', 'Victoria City', '5', 'Central Park', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

April 20, 2011

Updated page copy

Posted by on 20 Apr 2011 in Activity log

Received and added some new and updated page copy from PD. Added to both the development and live sites.

Added new "Notes on Enumeration Categories" page for 1901 census: http://vihistory.ca/content/census/1901/census1901.php?page=enumcats
Updated 1901 introduction page: http://vihistory.ca/content/census/1901/census1901.php?page=introduction
Update 1901 main index page (with link to enumeration categories page): http://vihistory.ca/content/census/1901/census1901.php?page=main

1911 Vic West data received by PD; imported

Posted by on 20 Apr 2011 in Activity log

Received the latest set of 1911 census data from PD - subdistrict, #1, Vic West. Imported into the development database smoothly and without incident.

Also inserted six new occupations:


INSERT INTO occupation (occupation_description, occupation_code) VALUES ('Pianist', 17140);
INSERT INTO occupation (occupation_description, occupation_code) VALUES ('Show Woman', 17225);
INSERT INTO occupation (occupation_description, occupation_code) VALUES ('Butter Maker', 77530);
INSERT INTO occupation (occupation_description, occupation_code) VALUES ('Gas Fitter', 87120);
INSERT INTO occupation (occupation_description, occupation_code) VALUES ('Cement Worker', 95200);
INSERT INTO occupation (occupation_description, occupation_code) VALUES ('Road Maker', 97415);

And a new location for the sub district:


INSERT INTO "location" VALUES (83, 'BC', 'British Columbia', '13', 'Victoria City', '1', 'Victoria West', NULL, NULL, NULL, NULL, NULL, '1911', NULL, NULL);

April 18, 2011

New views: origins, nationalities, races, bands, mother tongue

Posted by on 18 Apr 2011 in Activity log

Added more views to the PostgreSQL database to generate lists of values that are present in the census data, for the advanced search form.

vOriginsUsed - used for the "Birthplace", "Birthplace of Father", and "Birthplace of Mother" fields:


CREATE VIEW "vOriginsUsed" AS

SELECT origin_id, origin

FROM "origins"  

WHERE origin IS NOT NULL   

AND origins.origin_id IN (
    SELECT "vCensus1881".birthplace_id FROM "vCensus1881"
    UNION
    SELECT "vCensus1881".birthplace_father_id FROM "vCensus1881"
    UNION
    SELECT "vCensus1891".birthplace_id FROM "vCensus1891"
    UNION
    SELECT "vCensus1891".birthplace_father_id FROM "vCensus1891"
    UNION
    SELECT "vCensus1891".birthplace_mother_id FROM "vCensus1891"
    UNION
    SELECT "vCensus1901".birthplace_id FROM "vCensus1901"
    UNION
    SELECT "vCensus1911".birthplace_id FROM "vCensus1911"
    )  
    
ORDER BY upper(origin)

The origins view is pretty slow because of the large number of SELECT subqueries.

vNationalitiesUsed - used for the "Nationality" search field:


CREATE VIEW "vNationalitiesUsed" AS

SELECT nationality_id, nationality

FROM "nationalities"

WHERE nationality IS NOT NULL

AND nationalities.nationality_id IN (
    SELECT "vCensus1901".nationality_id FROM "vCensus1901"
    UNION
    SELECT "vCensus1911".nationality_id FROM "vCensus1911"
)

ORDER BY upper(nationality)

vRacesUsed - used for the "Race/Ethnic Origin" search field:


CREATE VIEW "vRacesUsed" AS

SELECT race_id, race_name

FROM "races"

WHERE race_name IS NOT NULL

AND races.race_id IN (
    SELECT "vCensus1901".race_id FROM "vCensus1901"
    UNION
    SELECT "vCensus1911".race_id FROM "vCensus1911"
)

ORDER BY upper(race_name)

vBandsUsed - used for the "First Nations Band" search field:


CREATE VIEW "vBandsUsed" AS

SELECT band_id, band

FROM "band"

WHERE band IS NOT NULL

AND band.band_id IN (
    SELECT "vCensus1881".band_id FROM "vCensus1881"
    UNION
    SELECT "vCensus1891".band_id FROM "vCensus1891"
)

ORDER BY upper(band)

vMotherTonguesUsed - used for the "Mother Tongue", "First Language Spoken", and "Second Language Spoken" search fields:


CREATE VIEW "vMotherTonguesUsed" AS

SELECT mother_tongue_id, mother_tongue

FROM "mother_tongues"

WHERE mother_tongue IS NOT NULL

AND mother_tongues.mother_tongue_id IN (
    SELECT "vCensus1901".mother_tongue_id FROM "vCensus1901"
    UNION
    SELECT "vCensus1911".first_language_id FROM "vCensus1911"
    UNION
    SELECT "vCensus1911".second_language_id FROM "vCensus1911"
)

ORDER BY upper(mother_tongue)

New view for all used occupations

Posted by on 18 Apr 2011 in Activity log

Made a new view, "vOccupationsUsed", to get a list of all occupations that show at least once in any of the census data sets (1881-1911). This will enable the "occupations" pop up list in the advanced census form only to show occupations that are present in the census data. Other views for religion, race, birthplace, etc. will follow.


CREATE VIEW "vOccupationsUsed" AS 

SELECT DISTINCT ON (upper(occupation_description)) occupation_description AS occupation_description, occupation_id, occupation_code

FROM "occupation"

WHERE 
    occupation_description IS NOT NULL 
    
    AND 
    (
        (occupation.occupation_code IN (SELECT "vOccupations1881".code FROM "vOccupations1881" WHERE "vOccupations1881".count1881 > 0))
        OR
        (occupation.occupation_code IN (SELECT "vOccupations1891".code FROM "vOccupations1891" WHERE "vOccupations1891".count1891 > 0))
        OR
        (occupation.occupation_code IN (SELECT "vOccupations1901".code FROM "vOccupations1901" WHERE "vOccupations1901".count1901 > 0))
        OR
        (occupation.occupation_code IN (SELECT "vOccupations1911".code FROM "vOccupations1911" WHERE "vOccupations1911".count1911 > 0))
    )

ORDER BY upper(occupation_description)

April 15, 2011

Changes made as per last meeting

Posted by on 15 Apr 2011 in Activity log

Made all changes, and committed to the development site, from the last meeting EXCEPT for "When choosing occupations, races, birthplaces, relationships to head, etc. in the advanced search form, only display values that have corresponding census records". Due to the inconsistency between some of the codes, that will take a little while longer.

SQL to update census_1911 location ID

Posted by on 15 Apr 2011 in Activity log

Since location_id isn't part of the 1911 records insert, but sub_district_id is, use this SQL to update the location_id after you've put a row into the location table corresponding to the new sub-district:


UPDATE census_1911 SET location_id = (SELECT location.location_id FROM location WHERE location_02_id = '13' AND location_03_id = CAST(census_1911.sub_district_id AS character varying(50)) AND census_year = '1911')

April 11, 2011

Updated views with street information

Posted by on 11 Apr 2011 in Activity log

Updated the four census views with street_number and street_name fields. The fields are NULL in 1881 and 1891 and point to real data in 1901 and 1911.

Loader app changes: converted from raw SQL to organized arrays

Posted by on 11 Apr 2011 in Activity log

To prepae for the previous update (adding 'conditionKey' support), I modified the loader so that, instead of building a raw SQL query to insert each row, the app now builds an associative array of data and then passes it to either pg_insert() or pg_update(), depending on whether any conditions exist. Aside from allowing for condition support, this change also keeps the data more organized and takes out the guesswork involved in building raw SQL queries via a for loop. I also had to edit the _fix_field() function in inc/lib.inc to stop the automatic escaping of data values into single quotes (e.g. 'data'). pg_insert() and pg_update() escape automatically, so the code in _fix_field() was resulting in double escaping.

Loader app expansion: conditional updating support

Posted by on 11 Apr 2011 in Activity log

PD recently sent me an updated version of the 1901 data with expanded address information, namely street number and street name separated into two fields. This will allow us to have street searching for both 1901 and 1911. However, the loader application didn't support updating existing rows, only importing new ones. So, I had to build update capabilities.

The loader now supports "conditionKey" lines, which can be used to specify conditions that need to be met in order to perform an update. The format of the line is:


conditionKey=SourceDataColumnName:DatabaseTableColumnName

For example, this is the condition key line that I specified when importing the new address data for 1901:


conditionKey=census_record_id:census_record_id

The field on the left of the colon tells the loader to look at the value of the census_record_id column in the source data CSV, while the field on the right side of the colon tells the loader to check that value against the census_record_id column in the database table.

Internally, adding one or more conditions causes the loader to use the pg_update() function instead of the pg_insert function for the row in question.

Multiple conditionKey lines may be specified.

April 6, 2011

Dev site updates

Posted by on 06 Apr 2011 in Activity log

Committed all of my changes to the census search/display to the SVN repository and pushed to the dev site. Dev site also has an up-to-date database with the partial 1911 data (Fernwood/Hillside and Rock Bay) as well as any new codes I've added (occupations, nationality, race, etc.). From this point I'll commit my code changes daily and push to the dev site at the end of the day.

April 4, 2011

Changes made to 1911 table/view: languages

Posted by on 04 Apr 2011 in Activity log

Some clarification from PD about how language fields work in the 1911 data. There are two language fields: 1st language spoken and 2nd language spoken. I had mistakingly thought that 1st language spoken was equivalent to the "mother tongue" of previous years and imported it as such. I had also made a new "secondary_language" field in the 1911 table and in all of the views. PD, however, informs me that, in fact, there is no "mother tongue" in the 1911 census and "1st language spoken" and "2nd language spoken" are new fields. So, I've modified the 1911 census table and all of the census views as follows: - census_1911 table: renamed "mother_tongue_id" to "first_language_id"; renamed "secondary_language_id" to "second_language_id" - view for census 1911: converted mother_tongue and mother_tongue_id to NULL fields - views for all census years: renamed "secondary_language_id" to "second_language_id"; added "first_language_id" and "first_language" as NULL fields to all years previous to 1911

March 28, 2011

New fields added to census views

Posted by on 28 Mar 2011 in Activity log

Added 11 new fields to the views for each census year. The only view that actually contains data in these fields is 1911; the other years just have NULL values. All of the census year views (vCensus1881 - 1911) need to have the same list of columns since UNION queries are performed during census searches.

The new fields in the vCensus1911 view:


census_1911.employment_state,
census_1911.other_employment,
census_1911.weeks_working,
census_1911.hours_working,
census_1911.hourly_wage,
census_1911.weeks_working_other_employment,
census_1911.hours_working_other_employment,
census_1911.life_insurance,
census_1911.health_insurance,
census_1911.insurance_cost,
census_1911.education_cost

The new fields in the other views:


NULL::character varying(255) AS employment_state,
NULL::character varying(100) AS other_employment,
NULL::integer AS weeks_working,
NULL::integer AS hours_working,
NULL::numeric(6,2) AS hourly_wage,
NULL::integer AS weeks_working_other_employment,
NULL::integer AS hours_working_other_employment,
NULL::numeric(8,2) AS life_insurance,
NULL::numeric(8,2) AS health_insurance,
NULL::character varying(100) AS insurance_cost,
NULL::numeric(12,2) AS education_cost

March 22, 2011

vCensus1911: the census_1911 view

Posted by on 22 Mar 2011 in Activity log

Finished the first pass at the vCensus1911 view, which is the main census query view for the census_1911 table. Since all of the census year views need to have exactly the same columns, I started with the vCensus1901 view and modified the column definitions to suit the new data. Another consequence is that, because view definitions need to be the same for every year, any new data columns in the 1911 view need to be retroactively put into the earlier viewsm usually as NULL returns. So far, I've put in just two: secondary_language_id and secondary_language. The 1911 census is the first dataset to have a second language option, so the earlier views need to have NULL fields for the second language columns. The NULL fields look like this:


NULL::integer AS secondary_language_id,
NULL::varchar(100) AS second_language

The vCensus1911 view, in all its glory, looks like this:


CREATE VIEW "vCensus1911" AS
SELECT 
    census_1911.census_record_id, 
    '1911'::character varying(4) AS census, 
    census_1911.location_id, 
    census_1911.family_id, 
    NULL::character varying(10) AS case_number, 
    initcap(census_1911.family_name) AS family_name, 
    initcap(census_1911.given_name) AS given_name, 
    NULL::character varying(100) AS middle_name, 
    census_1911.sex, 
    census_1911.age,
    census_1911.birthyear::double precision,
    census_1911.birthmonth::double precision, 
    NULL::double precision as birthday,
    census_1911.marital_status_id, 
    census_1911.relationship_to_head_id, 
    census_1911.birthplace_id, 
    NULL::integer AS birthplace_father_id, 
    NULL::integer AS birthplace_mother_id, 
    NULL::boolean AS french_canadian, 
    census_1911.occupation_id, 
    census_1911.religion_id, 
    NULL::integer AS building_id, 
    NULL::integer AS construction_id, 
    NULL::smallint AS floors, 
    NULL::smallint AS rooms, 
    census_1911.employment, 
    NULL::integer AS employees, 
    census_1911.read, 
    census_1911.write, 
    NULL::boolean AS school, 
    census_1911.months_school, 
    NULL::integer AS infirmities_id,
    census_1911.comments, 
    NULL::integer AS band_id, 
    census_1911.earnings, 
    census_1911.earnings_extra, 
    NULL::integer AS earnings_period_id, 
    (census_1911.earnings + census_1911.earnings_extra)::numeric(6,2) AS total_earnings, 
    census_1911.english, 
    census_1911.french, 
    census_1911.mother_tongue_id,  
    census_1911.immigration_year, 
    census_1911.naturalization_year::smallint AS naturalization_year, 
    NULL::integer AS birthplace_type_id, 
    NULL::integer AS colour_id, 
    census_1911.race_id, 
    census_1911.nationality_id, 
    initcap(census_1911.address) AS address, 
    census_1911.age_months, 
    location.location_01_id, 
    location.location_01, 
    location.location_02_id, 
    location.location_02, 
    location.location_03_id, 
    location.location_03, 
    location.location_04_id, 
    location.location_04, 
    location.comments AS loc_comments, 
    marital_status.marital_status, 
    birth.origin AS birth_place, 
    NULL::character varying(255) AS birth_place_father, 
    NULL::character varying(255) AS birth_place_mother, 
    occupation.occupation_code, 
    occupation.occupation_description, 
    religion.religion, 
    NULL::character varying(100) AS building, 
    NULL::character varying(100) AS construction, 
    array_to_string(array(SELECT infirmities.infirmities FROM infirmities INNER JOIN census_record_infirmities ON census_record_infirmities.census_year = '1911' AND census_record_infirmities.infirmities_id = infirmities.infirmities_id INNER JOIN census_1911 AS temp_census ON temp_census.census_record_id = census_record_infirmities.census_record_id WHERE census_record_infirmities.census_record_id = census_1911.census_record_id), ', ')::character varying(100) AS infirmities,
    relationship_to_head.relationship_to_head, 
    NULL::character varying(15) AS earning_period_name, 
    mother_tongues.mother_tongue,
    NULL::character varying(15) AS birthplace_type, 
    NULL::character varying(20) AS colour, 
    races.race_name, 
    nationalities.nationality, 
    NULL::character varying(255) AS band, 
    census_1911.secondary_language_id,
    secondary.mother_tongue AS second_language,
    ( SELECT count(annotations.annotation_id) AS count
           FROM annotations
          WHERE annotations.table_id = 7 AND annotations.record_id = census_1911.census_record_id) AS has_annote
   FROM census_1911
   LEFT JOIN location ON location.location_id = census_1911.location_id
   LEFT JOIN marital_status ON marital_status.marital_status_id = census_1911.marital_status_id
   LEFT JOIN relationship_to_head ON relationship_to_head.relationship_to_head_id = census_1911.relationship_to_head_id
   LEFT JOIN origins birth ON birth.origin_id = census_1911.birthplace_id
   LEFT JOIN occupation ON occupation.occupation_id = census_1911.occupation_id
   LEFT JOIN religion ON religion.religion_id = census_1911.religion_id
   LEFT JOIN mother_tongues ON mother_tongues.mother_tongue_id = census_1911.mother_tongue_id
   LEFT JOIN mother_tongues secondary ON secondary.mother_tongue_id = census_1911.secondary_language_id
   LEFT JOIN races ON races.race_id = census_1911.race_id
   LEFT JOIN nationalities ON nationalities.nationality_id = census_1911.nationality_id;

GROUP_CONCAT alternative in PostgreSQL

Posted by on 22 Mar 2011 in Activity log

While building the census_1911 view, because of the new one to many structure for census records and infirmities, I had to do a sub-query on the census_record_infirmities lookup table and concatenate the results into one string. Unfortunately, the solution that Martin, Greg and myself devised using a SELECT GROUP_CONCAT subquery for Martin's Adaptive DB ( see http://hcmc.uvic.ca/blogs/index.php?blog=44&p=7646&more=1&c=1&tb=1&pb=1 ), which is what I had planned to use, wouldn't fly because PostgreSQL doesn't support GROUP_CONCAT. Luckily, there's another (and arguably better) solution using the aggregate array_to_string() function:


array_to_string(array(SELECT infirmities.infirmities FROM infirmities INNER JOIN census_record_infirmities ON census_record_infirmities.census_year = '1911' AND census_record_infirmities.infirmities_id = infirmities.infirmities_id INNER JOIN census_1911 AS temp_census ON temp_census.census_record_id = census_record_infirmities.census_record_id WHERE census_record_infirmities.census_record_id = census_1911.census_record_id), ', ') AS infirmities

So, the SELECT query, which returns 0 or more rows with one column each, is cast as an array. That array is then converted to a string, using the separator defined at the end of the function to join the values together.

March 21, 2011

census_1911 religion, occupation, location PostgreSQL views

Posted by on 21 Mar 2011 in Activity log

Created some views for the 1911 data, to match previous years (haven't yet created the big search view for 1911):

Religion count:


CREATE VIEW "vReligions1911" AS
    SELECT religion.religion_id AS code, religion.religion AS name, count(census_1911.census_record_id) AS count1911 FROM (religion LEFT JOIN census_1911 ON ((census_1911.religion_id = religion.religion_id))) GROUP BY religion.religion_id, religion.religion;

Occupation count:


CREATE VIEW "vOccupations1911" AS
    SELECT occupation.occupation_code AS code, occupation.occupation_description AS name, count(census_1911.census_record_id) AS count1911 FROM (occupation LEFT JOIN census_1911 ON ((census_1911.occupation_id = occupation.occupation_id))) GROUP BY occupation.occupation_code, occupation.occupation_description;

Location count (2 views):


CREATE VIEW "vCensus1911LocCount" AS
    SELECT census_1911.location_id, count(census_1911.census_record_id) AS loc_count FROM census_1911 GROUP BY census_1911.location_id;
    
CREATE VIEW "vLocation1911Count" AS
    SELECT location.location_id, location.location_01_id, location.location_01, location.location_02_id, location.location_02, location.location_03_id, location.location_03, location.location_04_id, location.location_04, location.comments, location.notes, location.map_name, location.census_year, "vCensus1911LocCount".loc_count FROM (location LEFT JOIN "vCensus1911LocCount" ON (("vCensus1911LocCount".location_id = location.location_id))) WHERE ((location.census_year)::text = '1911'::text);

March 17, 2011

Loader application expansion: check for existing record before importing

Posted by on 17 Mar 2011 in Activity log

I further expanded the loader application and Loader class to do an optional check for existing records before importing data. This is useful when importing new values to an existing database table using a dataset that has existing as well as new data. I wrote this feature because PD sent me occupation and relationship-to-head code CSVs which have both codes that are already in the database tables (occupation and relationship_to_head respectively) from previous census years, as well as new codes needed for the 1911 census. I couldn't just delete the existing data in the tables and do a fresh re-import because the tables contain some extra data not in PD's spreadsheets. So, I devised a feature for the loader that allows for one or more "existing key" lines to be specified in the map file for the dataset. The value of each specified key in the row of data is queried against the database. Multiple keys may be specified to check multiple values.

The format of the line is:


checkForExisting=SourceDataColumnName:DatabaseTableColumnName

For example, this is my map file for the new occupation data CSV file:


occupation_id='DEFAULT'
occupation_code=*pad(occupation_code,5,'0','left')
occupation_description=occupation_description

checkForExisting=occupation_code:occupation_code

Two fields are imported, occupation_code and occupation_description (they have the same names in both the database table and the CSV). occupation_id is generated as a 'DEFAULT' (i.e. auto-increment) value. The checkForExisting line specifies that, before importing a row, the value of that row's occupation_code (the left side of the colon) column will be checked against the database table's occupation_code (the right side of the colon). If there's a match then the row is skipped.

I've put the documentation for this feature in the Loader manual (basically word-for-word from this blog post).

Loader application logging changes

Posted by on 17 Mar 2011 in Activity log

I expanded the loader application's error/warning log system to use the Loader class I previously wrote. Warnings and errors are now gathered into arrays and then printed at the end, rather than during the iteration over rows. Warnings and errors are also now organized into 'line', 'message', 'type', and 'SQL' values in the Loader class. I also added an option to show either all errors/warnings complete with line numbers and SQL, or just a list of distinct errors, which will be useful for quickly assessing the variety of errors/warnings in an import.

I also modified the logging process so that, instead of having to specify a target directory for the log files each time the loader application is used, the logs are just saved to a "logs" subdirectory within the loader app. Much simpler and more intuitive.

March 16, 2011

CSV/MSAccess Loader application modifications completed for census_1911 data

Posted by on 16 Mar 2011 in Activity log

I've completed (I think) the necessary modifications to the handy Loader application originally written by DB, which was designed to import census data from an MS Access database or CSV file into a PostgreSQL table. A complete description of the application can be found in the Loader Manual document which is distributed with the program. Because both the format of the data and the development environment have changed substantially - we now receive the raw data from PD in CSV format, and we're developing on Linux instead of Windows (which is what DB was using) - quite a few changes were needed. I also made a few general improvements along the way.

General Improvements

Completed the CSV importing portion of the code, which was never finished (because DB was focused on Access), and made that the default option rather than MS Access
Created a Loader class (inc/loader.php) to abstract some of the functionality previously "spaghettied" in the main loader file, mainly loading the datasource and gathering all of the rows. The interface for both of these tasks, among others, is now the same whether a CSV file or Access DB is used

Mapping Function Improvements

I wrote some new mapping functions and modified some existing ones to help me import the 1911 data (complete documentation in Loader manual. All of these functions are in inc/lib.inc):

Wrote a new _existsMultiple() function that works like the existing _exists() function but allows for testing multiple values (e.g. test whether a value exists in any specified row)
Made the existing _is_true() function, which was just being used by other loader map functions, available for direct use in map files. It will test fields for human-readable true/false values ("Y", "yes", "N", "no", etc.) and return a boolean value
Wrote a new _fractionalMonth() function, which takes a value like "3/12" and returns "3". This was written because the 1911 data from PD uses that format (3/12 and similar) to denote the age in months for infants
Modified the existing _lookup() function to accept a string literal as a search value, rather than a reference to a source data column name
Wrote a new _occupation() function to deal with both 4 and 5 digit occupation codes. The function formats the value to 5 digits and then passes everything on to _lookup()

One To Many Relationships

The Loader application did not support one-to-many (OTM) relationships. So, for example, if a census record could have multiple infirmities (i.e. census record -> has many infirmities), there was no way to properly record these multiple infirmities. Instead, the data was shoehorned into the census year data tables in an awkward way. So, for a more accurate representation of the 1911 (and beyond) data, in consultation with MH I decided to write OTM relationship support into the Loader application and mapping files.

The new system for mapping OTM relationships is somewhat convoluted and complex, but because I don't have the time or desire to rewrite the Loader application, I had to work within its confines. It uses the value of a specified CSV column and performs a lookup on a secondary table to grab the ID associated with that value. Then, it inserts that ID, along with the census record ID and any other data you want to pass, into the one to many lookup table.

A line looks like this (all one line):


oneToMany:RawDataColumn=one_to_many__table;target_column_in_otm_table;lookup_table,lookup_table_key_fields,lookup_table_return_field;extra_key:val_data

Or, a better concrete example (again, all one line):


oneToMany:Blind=census_record_infirmities;infirmities_id;infirmities,infirmities:"Blind",infirmities_id;census_year:1911

How it works:

First, the line is split at the '=', resulting in two arrays which we'll call LeftSide and RightSide. LeftSide[1] gives us the CSV column from which we'll be taking our data. RightSide contains all of the other info.

RightSide is itself exploded with a semicolon separator, and breaks down like this:

0: the table holding the OTM relationship records. In this case, census_record_infirmities is used for the census record -> has many infirmities relationship.
1: the name of the column in the OTM table, the value for which will be generated from LeftSide[1] (the data in the CSV column)
2: the information for the ID lookup. The arguments here, comma-separated, follow the same conventions as the _lookup() function (which is used to get the value we want - documented in the Loader manual).
3: An optional list of comma-separated key:value pairs which will be inserted into the OTM table without modification. In the above example, census_year:1911 means that "1911" will be inserted into the census_record_infirmities.census_year column for each record.

A oneToMany line is simply included in the mapping file as normal - one OTM line per CSV column that should be mapped. In the 1911 Rock Bay CSV from PD, for example, there are four infirmities columns: Blind, DeafAndDumb, CrazyOrLunatic, and IdioticOrSilly. The four lines in the census1911.txt mapping file look like this:


oneToMany:Blind=census_record_infirmities;infirmities_id;infirmities,infirmities:"Blind",infirmities_id;census_year:1911

oneToMany:DeafAndDumb=census_record_infirmities;infirmities_id;infirmities,infirmities:"Deaf and Dumb",infirmities_id;census_year:1911

oneToMany:CrazyOrLunatic=census_record_infirmities;infirmities_id;infirmities,infirmities:"Crazy",infirmities_id;census_year:1911

oneToMany:IdioticOrSilly=census_record_infirmities;infirmities_id;infirmities,infirmities:"Helpless",infirmities_id;census_year:1911

Sources for standardized codes

Posted by on 16 Mar 2011 in Activity log

Thanks to PD, I've found the resource for the majority of standardized codes used in the database. The Canadian Families Project has a PDF user guide for its 1901 census database: http://web.uvic.ca/hrd/cfp/data/1901%20Census%20User%20Guide.pdf

The VIHistory database uses the following codes from the guide (found under the header 'Standard Variable Codes'):

Birth place
Infirmity
Mother Tongue
Nationality
Race
Religion

IMPORTANT NOTE: the codes in the guide for occupation and relationship to head are NOT used in the database. Instead:

Relationship to head: this seems to be a custom system (unless I hear otherwise from PD or JL)
Occupation: the occupation codes follow the ISCO 1968 classification system

March 15, 2011

census_1911 table revisions; new occupation data

Posted by on 15 Mar 2011 in Activity log

Work done for March 15th includes:

census_1911 table schema:

Changed family_id from integer to character varying(15) to allow for alphanumeric values such as "151a". Since the ID doesn't actually refer to anything else, it's fine to have a varchar value.
Dropped the secondary_occupation_id column and added a other_employment character varying(100) column in its place. The OtherEmployment CSV column doesn't follow the same conventions as the ChiefOccupation column, so the occupation codes don't apply. The new other_employment column is simply a direct import of the OtherEmployment text.
Added weeks_working_other_employment and hours_working_other_employment integer columns, which I forgot to make during the initial creation.

Occupation data:

Inserted new rows into the occupation table (name (code)): Shipwright (87450), Master Mariner (98819), Delivery Man (98636), Secretary, School Board (39200), Teacher (13340)

March 14, 2011

census_1911 schema changes

Posted by on 14 Mar 2011 in Activity log

Made a couple of schema changes to the census_1911 table today:

Changed naturalization_year from smallint to character varying(50) to account for non-integer values in the data such as "*", "N/A", etc.
Added a temporary sub_district_id field to store the subdistrict data. Once we receive complete census tract info for the locations table, this column will be used to populate the location_id column and then deleted.

1911 census data: raw -> database mapping

Posted by on 14 Mar 2011 in Activity log

The method for importing raw spreadsheet data into the PostgreSQL database involves making a text file with a number of "mapping" instructions for each column in the spreadsheet. These mapping files are used by the custom data importing application, "Loader", originally written by DB. For example, to map the "family_id" SQL column to the "FamilyName" CSV column, one would enter:


family_id:FamilyName

That tells the data importer to insert the value of the "FamilyName" raw data column into the "family_id" SQL table column. There's support for a number of functions as well, which transform the CSV data. The complete format and documentation for the mapping files is available in the ViHistory technical documentation.

The mapping of the 1911 data, which I'm still working on, has been fairly straightforward, with a few wrinkles:

I wrote a new function for Loader called _existsMultiple(), which is a extended version of the existing _exists() function that can accept multiple semi-colon separated field names and/or values
I made the existing function _is_true(), which was only used internally by other Loader functions, available for use in the map files to convert Y/N and other similar columns to boolean values
Wrote new _fractionalMonth() Loader function to find a value like "3/12" and return "3". This is because infants' ages have been represented as a fraction, and the month needs to be extracted to populate the age_months column

PD to answer some questions that are preventing me from completing the mapping.

Table for 1911 census data created

Posted by on 14 Mar 2011 in Activity log

After a thorough examination of the sample 1911 census data given to me by PD, I've created a census_1911 table in the PostgreSQL database for the new data set. I used the census_1901 table as a starting point and added/subtracted columns as needed to match the new data. The 1911 table has 45 columns vs. 33 in the 1901 table, but that may change as we get more of the data; PD is sending us a spreadsheet for each subdistrict so he may be refining the format along the way.

I documented the census_1911 table with SQL comments. Here's the CREATE TABLE query, with comments:


CREATE TABLE census_1911
(
  census_record_id serial NOT NULL, -- Unique ID
  location_id integer, -- Foreign key reference to location table; generated from "SubDistrictNo" CSV column
  family_id integer, -- Direct import of "FamilyNumber" CSV column
  family_name character varying(100), -- Direct import of "lastName" CSV column
  given_name character varying(100), -- Direct import of "FirstName" SQL column
  sex character(1), -- Direct import of "Sex" CSV column, either "M" or "F"
  age smallint, -- Direct import of "AgeAtLastBirthday" CSV column
  marital_status_id integer, -- Conversion of "SingleMarriedWidowed" CSV column from text to marital_status foreign key reference
  relationship_to_head_id integer, -- Foreign key reference to relationship_to_head table; "relHeadCode" CSV column corresponds to relationship_to_head.relationship_to_head_id
  birthplace_id integer, -- Foreign key reference to origins table; generated from "PlaceOfBirth" CSV text column
  occupation_id integer, -- Foreign key reference to occupation table; "viOccupationCode" CSV column corresponds to occupation.occupation_code
  religion_id integer, -- Foreign key reference to religion table; "religionCode" CSV column corresponds to religion.religion_id
  "read" boolean, -- Conversion of "CanRead" CSV column from Y/N to boolean field
  "write" boolean, -- Conversion of "CanWrite" CSV column from Y/N to boolean field
  infirmities_id integer, -- Foreign key reference to infirmities table; amalgamation of "Blind", "DeafAndDumb", "CrazyOrLunatic", and "IdioticOrSilly"  Y/N CSV columns into a single field
  comments text, -- Direct import of "Notes" CSV column
  months_school smallint, -- Direct import of "MonthsInSchool" CSV column
  earnings numeric(12,2), -- Direct import of "EarningsAtChiefOcc" CSV column
  english boolean, -- Generated from presence of "ENGLISH" in "LanguageSpoken1" or "LanguageSpoken2" CSV columns
  french boolean, -- Generated from presence of "FRENCH" in "LanguageSpoken1" or "LanguageSpoken2" CSV columns
  immigration_year smallint, -- Direct import of "YearOfImmigration" CSV column
  naturalization_year smallint, -- Direct import of "YearOfNaturalization" CSV column
  earnings_extra numeric(6,2), -- Direct import of "EarningsAtOtherOcc" CSV column
  race_id integer, -- Foreign key reference to races table; generated from "RaceTribeOrOrigin" CSV column
  nationality_id integer, -- Foreign key reference to nationalities table; generated from "Nationality" CSV column
  mother_tongue_id integer, -- Foreign key reference to mother_tongues table; generated from "LanguageSpoken1" CSV column
  address character varying(255), -- Direct import of "Habitation" CSV column
  age_months smallint, -- Import of "AgeAtLastBirthday" CSV column, but only if the value is fractional (i.e. below 1)
  employment character varying(60), -- Amalgamation of "Employer", "Employee", and "WorkingOnOwnAccount" Y/N CSV columns to single, comma-separated text string
  education_cost numeric(12,2), -- Direct import of "CostOfEducation" CSV column
  life_insurance numeric(8,2), -- Direct import of "LifeInsurance" CSV column
  health_insurance numeric(8,2), -- Direct import of "HealthInsurance" CSV column
  insurance_cost character varying(100), -- Direct import of "CostOfInsurance" CSV column
  weeks_working integer, -- Direct import of "WeeksWorkingChiefOcc" CSV column
  hours_working integer, -- Direct import of "HoursWorkingChiefOcc" CSV column
  hourly_wage numeric(6,2), -- Direct import of "RateOfEarningPerHour" CSV column
  secondary_occupation_id integer, -- Foreign key reference to occupation table; generated from "OtherEmployment" CSV column
  secondary_language_id integer, -- Foreign key reference to mother_tongues table; generated from "LanguageSpoken2" CSV column
  employment_state character varying(255), -- Direct import of "StateWherePersonIsEmployed" CSV column
  street_number character varying(50), -- Direct import of "StreetNumber" CSV column
  street_name character varying(100), -- Direct import of "StreetName" CSV column
  religion_text character varying(100), -- Direct import of "Religion" CSV column
  occupation_text character varying(100), -- Direct import of "ChiefOccupation" CSV column
  birthmonth integer, -- Conversion of "MonthofBirth" text CSV column to PostgreSQL month integer
  birthyear integer, -- Direct import of "YearofBirth" CSV column
  CONSTRAINT census_1911_pkey PRIMARY KEY (census_record_id),
  CONSTRAINT census_1911_birthplace_id_fkey FOREIGN KEY (birthplace_id)
      REFERENCES origins (origin_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT census_1911_infirmities_id_fkey FOREIGN KEY (infirmities_id)
      REFERENCES infirmities (infirmities_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT census_1911_location_id_fkey FOREIGN KEY (location_id)
      REFERENCES "location" (location_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT census_1911_marital_status_id_fkey FOREIGN KEY (marital_status_id)
      REFERENCES marital_status (marital_status_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT census_1911_mother_tongue_id FOREIGN KEY (mother_tongue_id)
      REFERENCES mother_tongues (mother_tongue_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT census_1911_nationality_id FOREIGN KEY (nationality_id)
      REFERENCES nationalities (nationality_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT census_1911_occupations_id_fkey FOREIGN KEY (occupation_id)
      REFERENCES occupation (occupation_id) MATCH SIMPLE
      ON UPDATE RESTRICT ON DELETE RESTRICT,
  CONSTRAINT census_1911_race_id FOREIGN KEY (race_id)
      REFERENCES races (race_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT census_1911_relationship_to_head_id_fkey FOREIGN KEY (relationship_to_head_id)
      REFERENCES relationship_to_head (relationship_to_head_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT census_1911_religion_id_fkey FOREIGN KEY (religion_id)
      REFERENCES religion (religion_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=TRUE
);

COMMENT ON COLUMN census_1911.census_record_id IS 'Unique ID';
COMMENT ON COLUMN census_1911.location_id IS 'Foreign key reference to location table; generated from "SubDistrictNo" CSV column';
COMMENT ON COLUMN census_1911.family_id IS 'Direct import of "FamilyNumber" CSV column';
COMMENT ON COLUMN census_1911.family_name IS 'Direct import of "lastName" CSV column';
COMMENT ON COLUMN census_1911.given_name IS 'Direct import of "FirstName" SQL column';
COMMENT ON COLUMN census_1911.sex IS 'Direct import of "Sex" CSV column, either "M" or "F"';
COMMENT ON COLUMN census_1911.age IS 'Direct import of "AgeAtLastBirthday" CSV column';
COMMENT ON COLUMN census_1911.marital_status_id IS 'Conversion of "SingleMarriedWidowed" CSV column from text to marital_status foreign key reference';
COMMENT ON COLUMN census_1911.relationship_to_head_id IS 'Foreign key reference to relationship_to_head table; "relHeadCode" CSV column corresponds to relationship_to_head.relationship_to_head_id';
COMMENT ON COLUMN census_1911.birthplace_id IS 'Foreign key reference to origins table; generated from "PlaceOfBirth" CSV text column';
COMMENT ON COLUMN census_1911.occupation_id IS 'Foreign key reference to occupation table; "viOccupationCode" CSV column corresponds to occupation.occupation_code';
COMMENT ON COLUMN census_1911.religion_id IS 'Foreign key reference to religion table; "religionCode" CSV column corresponds to religion.religion_id';
COMMENT ON COLUMN census_1911."read" IS 'Conversion of "CanRead" CSV column from Y/N to boolean field';
COMMENT ON COLUMN census_1911."write" IS 'Conversion of "CanWrite" CSV column from Y/N to boolean field';
COMMENT ON COLUMN census_1911.infirmities_id IS 'Foreign key reference to infirmities table; amalgamation of "Blind", "DeafAndDumb", "CrazyOrLunatic", and "IdioticOrSilly"  Y/N CSV columns into a single field';
COMMENT ON COLUMN census_1911.comments IS 'Direct import of "Notes" CSV column';
COMMENT ON COLUMN census_1911.months_school IS 'Direct import of "MonthsInSchool" CSV column';
COMMENT ON COLUMN census_1911.earnings IS 'Direct import of "EarningsAtChiefOcc" CSV column';
COMMENT ON COLUMN census_1911.english IS 'Generated from presence of "ENGLISH" in "LanguageSpoken1" or "LanguageSpoken2" CSV columns';
COMMENT ON COLUMN census_1911.french IS 'Generated from presence of "FRENCH" in "LanguageSpoken1" or "LanguageSpoken2" CSV columns';
COMMENT ON COLUMN census_1911.immigration_year IS 'Direct import of "YearOfImmigration" CSV column';
COMMENT ON COLUMN census_1911.naturalization_year IS 'Direct import of "YearOfNaturalization" CSV column';
COMMENT ON COLUMN census_1911.earnings_extra IS 'Direct import of "EarningsAtOtherOcc" CSV column';
COMMENT ON COLUMN census_1911.race_id IS 'Foreign key reference to races table; generated from "RaceTribeOrOrigin" CSV column';
COMMENT ON COLUMN census_1911.nationality_id IS 'Foreign key reference to nationalities table; generated from "Nationality" CSV column';
COMMENT ON COLUMN census_1911.mother_tongue_id IS 'Foreign key reference to mother_tongues table; generated from "LanguageSpoken1" CSV column';
COMMENT ON COLUMN census_1911.address IS 'Direct import of "Habitation" CSV column';
COMMENT ON COLUMN census_1911.age_months IS 'Import of "AgeAtLastBirthday" CSV column, but only if the value is fractional (i.e. below 1)';
COMMENT ON COLUMN census_1911.employment IS 'Amalgamation of "Employer", "Employee", and "WorkingOnOwnAccount" Y/N CSV columns to single, comma-separated text string';
COMMENT ON COLUMN census_1911.education_cost IS 'Direct import of "CostOfEducation" CSV column';
COMMENT ON COLUMN census_1911.life_insurance IS 'Direct import of "LifeInsurance" CSV column';
COMMENT ON COLUMN census_1911.health_insurance IS 'Direct import of "HealthInsurance" CSV column';
COMMENT ON COLUMN census_1911.insurance_cost IS 'Direct import of "CostOfInsurance" CSV column';
COMMENT ON COLUMN census_1911.weeks_working IS 'Direct import of "WeeksWorkingChiefOcc" CSV column';
COMMENT ON COLUMN census_1911.hours_working IS 'Direct import of "HoursWorkingChiefOcc" CSV column';
COMMENT ON COLUMN census_1911.hourly_wage IS 'Direct import of "RateOfEarningPerHour" CSV column';
COMMENT ON COLUMN census_1911.secondary_occupation_id IS 'Foreign key reference to occupation table; generated from "OtherEmployment" CSV column';
COMMENT ON COLUMN census_1911.secondary_language_id IS 'Foreign key reference to mother_tongues table; generated from "LanguageSpoken2" CSV column';
COMMENT ON COLUMN census_1911.employment_state IS 'Direct import of "StateWherePersonIsEmployed" CSV column';
COMMENT ON COLUMN census_1911.street_number IS 'Direct import of "StreetNumber" CSV column';
COMMENT ON COLUMN census_1911.street_name IS 'Direct import of "StreetName" CSV column';
COMMENT ON COLUMN census_1911.religion_text IS 'Direct import of "Religion" CSV column';
COMMENT ON COLUMN census_1911.occupation_text IS 'Direct import of "ChiefOccupation" CSV column';
COMMENT ON COLUMN census_1911.birthmonth IS 'Conversion of "MonthofBirth" text CSV column to PostgreSQL month integer';
COMMENT ON COLUMN census_1911.birthyear IS 'Direct import of "YearofBirth" CSV column';

March 11, 2011

Questions sent to PD and JL

Posted by on 11 Mar 2011 in Activity log

PD gave me the 1911 census data for the Rock Bay subdistrict yesterday. After a thorough examination of the data and a comparison to previous years, some questions arose. I sent PD and JL these questions via email (text is unedited):

Employment types:

There are three fields: "Employer", "Employee", and "WorkingOnOwnAccount". It looks like these can have three possible values: "Y", "N", and blank. My first instinct was to store these as three separate fields in the 1911 census table, but, looking at the data for previous years, these are actually stored as a single text field, separated by commas, called "employment". For example, a 1901 record might have the text "Employer, Own Account" in the "employment" field instead of separating the data into separate fields. This struck me as odd, since, at least in the 1901 data, the original Access file also has the fields separated. Were you aware that the data was converted from multiple fields to a single field? Though my instinct is to keep the fields separate, you may have had good reason to convert these fields.

Another consequence of converting the separate fields into one big text field is that you lose the distinction between "N" for no and blank for refusal to answer. If we maintain separate fields for the different employment types, then census records with a refusal to answer can be properly recorded.

The location table:

To be blunt, I'm confused by the structure of the location table. I think I just need an explanation about how the table structure works, and how to make new locations without going against the current conventions. I've attached the locations table data as a CSV - would you be able to give me a quick rundown of how the various place names in each record relate to each other? For example, "British Columbia -> Vancouver -> Cowichan" seems a bit odd to me. If you can explain the relationship to me and also show me how a new entry would work - for, say, the Rock Bay subdistrict in the 911 data - then I'll be able to handle new census tracts as you give them to me.

March 10, 2011

Project meeting and near-term plans

Posted by on 10 Mar 2011 in Activity log

Meeting with JN, PD and JL to plan the next phase of the project. Outcomes:

The 1911 data is now in the form of a spreadsheet, which will be provided to us in CSV format. JN will adapt the original import tool so that it can use CSV instead of relying on an ODBC connection to an Access db.
There are still some questions about the 1911 data itself, which require input from ES -- for instance, there are three occupation codes, but only one question on the census about occupation; what are these codes, and how were they derived from the source data? It seems likely that some fields will be combined, eliminated, or not used in the interface.
The new data will be organized into ten separate files by census tract, and a field needs to be added to record the census tract.
JN will send PD the table of "relationships to heads of households" table data.
There are some corrections to previous years' data which need to be made on a record-by-record basis. These have been done in a spreadsheet, so some mechanism will have to be used to parse that spreadsheet and compare it to the database, then make corrections where necessary.
There are about 600 records for a previous year which need to be entered. Again, if we get those in CSV format they should be easy to import.
There are two new databases of building permits, one from TC announcements and the other from city records. These will need to be integrated in the same way that e.g. tax assessment data is currently integrated, with the possibility of searching by name and possibly also by address (now that more of the tables will contain address information).
Records in 1911 will ultimately be mapped to property ids, which will enable lookup of geo data for them. The mechanism for this is yet to be worked out.
We'll also look into possible ways of adding address data which has been compiled for earlier census years from street directories to the census tables for the earlier years. This might not be too complicated if the addresses have been tied to the ids of existing records in the database; if not, it might take some ingenuity.

Relationship to head data sent

Posted by on 10 Mar 2011 in Activity log

I emailed PD and JL the contents of the relationship_to_head table, which maps the "relationship to the head of the house" data to standardizd IDs.

March 9, 2011

Installing a LAMP stack with PostgreSQL and ODBC Microsoft Access support on Ubuntu

Posted by on 09 Mar 2011 in Activity log

In order to do proper development and testing on VIHistory I needed a proper LAMP stack with PostgreSQL support on my machine. I didn't want to develop on the live site, and I needed more features than releases such as XAMPP and LAPPStack could offer, namely ODBC support for the Microsoft Access -> PostgreSQL conversion process (PD and JL provide us with data in MS Access format). So, following a number of guides(*), I successfully installed my own LAMP stack - Linux, Apache, MySQL, and PHP - with PostgreSQL.

Initial package setup (via Synaptic):

apache2
apache2-mpm-prefork
apache2-utils
apache2.2-common
libapache2-mod-php5
libdbd-mysql-perl
libmyodbc
libmysqlclient
mysql-client-5.0
mysql-common
mysql-server-5.0
mysql-server-core-5.0
odbcinst1debian
php5
php5-cli
php5-common
php5-odbc
unixodbc
postgresql-8.4
postgresql-client-8.4
postgresql-client-common
postgresql-common
postgresql-contrib
php5-pgsql
libapache2-mod-auth-mysql
libapache2-mod-auth-pgsql
php5-mysql
phpmyadmin
pgadmin3
iodbc
mdbtools
libmdbodbc

In addition to the basic stuff I installed a few bells and whistles such as phpMyAdmin for MySQL and pgadmin3 for PostgreSQL administration.

Setup is minimal. Your default web root will be /var/www. Assuming all of the packages installed correctly, you should see the default Apache page at http://localhost/ immediately. Note that, by default, anyone who knows your IP address will be able to connect to your new web server. To allow only your machine to connect to your server:

```
sudo pico /etc/apache2/ports.conf
```
Look for this line:
```
Listen 80
```
Change to:
```
Listen 127.0.0.1:80
```
Save and close
Restart apache:
```
sudo /etc/init.d/apache2 restart
```
You're done.

To enable phpMyAdmin support: add this to the bottom of /etc/apache2/apache2.conf:


# Include phpMyAdmin
Include /etc/phpmyadmin/apache.conf

To initialize the postgres user for PostgreSQL:

sudo su postgres -c "psql -d template1"
ALTER USER postgres WITH PASSWORD 'YourUniquePasswordHere';
\q
sudo passwd -d postgres
sudo su postgres -c passwd
Enter the password you specified in step #2

Steps 1-3 deal with setting up the postgres user in PostgreSQL, while steps 4-6 set the postgres user's system password. From there, you can setup additional users (see references below).

ODBC connectivity - here's where it gets a bit dicey. Connecting to MySQL is simple (see the ODBC reference below), but MS Access is trickier since there's no native ODBC support for Linux. There are a few commercials options for Linux MS Access ODBC drivers, but they're astronomically expensive. After a lot (a lot!) of digging around, I finally came across a blog post from 2003 that explains how to get MS Access-ODBC connectivity on Linux using open source tools. Unfortunately, these open source tools (I think) only allow for read access, while the commercial alternatives can also write. Also unfortunately, the website is down at the time of writing so I had to load a cached version of the page, which takes nearly forever but does eventually work. Since it's from 2003 it's a bit outdated, but as long as you've installed the necessary packages (in the list above: unixodbc, mdbtools, libmdbodbc), it still holds up. So, to add an MS-Access ODBC:

Install the necessary packages (already done if you followed the list above) - the key is libmdbodbc

Add this to /etc/odbcinst.ini, which is probably empty (or doesn't exist yet):

[MDBToolsODBC]
Description = MDB Tools ODBC drivers
Driver = /usr/lib/libmdbodbc.so.0
Setup =
FileUsage = 1
CPTimeout =
CPReuse =

Define a datasource name (DSN), which basically means adding a unique entry to /etc/odbc.ini (which might be empty or not yet there) which links up the ODBC driver we just created with the database to which we want to connect:
```
[MyAccessFile]
Description = My Microsoft Access Database
Driver = MDBToolsODBC
Database = /path/to/access/file.mdb
Servername = localhost
UserName =
Password =
port = 5432
```
As long as you've put all of the files in the right places, you should have a working ODBC-Access connection (see the test section below).

Now to test your MySQL, PostgreSQL, and Microsoft Access ODBC connections with this simple script, which I put in /var/www/test.php:


<?php
$mysql = mysql_connect('localhost', 'username', 'password');
echo "MySQL: ";
echo $mysql ? "success" : mysql_error();
echo "<br />";

$pg = pg_connect("host=localhost port=5432 dbname=template1 user=postgres password=yourpostgrespassword");
echo "PostgreSQL: ";
echo $pg ? "success" : pg_last_error();
echo "<br />";

$odbc = odbc_connect("MyAccessFile", "", "");
echo "ODBC: ";
echo $odbc ? "success" : odbc_errormsg();
?>

* - References:

Ultra-basic LAMP installation for Ubuntu: http://www.howtoforge.com/ubuntu_lamp_for_newbies
Setting up phpMyAdmin: http://www.howtoforge.com/ubuntu_debian_lamp_server
Package list for PHP with ODBC support on Ubuntu: http://www.php.net/manual/en/function.odbc-connect.php#94257
Setting up PostgreSQL: http://www.ubuntugeek.com/howto-setup-database-server-with-postgresql-and-pgadmin3.html
Adding users to a PostgreSQL database: http://www.cyberciti.biz/faq/howto-add-postgresql-user-account/
Microsoft Access Database using Linux and PHP (Yahoo cache version): http://74.6.117.48/search/srpcache?ei=UTF-8&p=http://bryanmills.net/archives/2003/11/microsoft-access-database-using-linux-and-php/&fr=sfp&u=http://cc.bingj.com/cache.aspx?q=http://bryanmills.net/archives/2003/11/microsoft-access-database-using-linux-and-php/&d=4680032763580841&mkt=en-US&setlang=en-US&w=463c4d2e,a36d7b22&icp=1&.intl=us&sig=A7p1jhQjSG4M7jIlS5N85Q--

March 8, 2011

Suspected spambot page

Posted by on 08 Mar 2011 in Activity log

Modified the spambot .htaccess redirection code. Rather than sending suspected spambots a 403 error, they're now getting redirected here, just in case a legit user has the same user agent as a spam bot:

http://vihistory.ca/content/about/bot.php

September 13, 2010

problem with host name in connection string

Posted by on 13 Sep 2010 in Activity log

We seem to be having a timing issue with domain names for our database server. If we use
$connect_str = "host=db.tapor.uvic.ca . . . ";
connection works. We're migrating to new domain names, but when we use:
$connect_str = "host=pgdev.tapor.uvic.ca . . . ";
the connection fails.

The instance of VIHist on lettuce is running against mapserver 4.10. The instance of VIHist on mustard is running against mapserver 5.x. (Same is true for the two instances of MapOfLondon experimental map.) Both instances of MoL work, but only the instance of VIHist running on mapserver 4.x is working. We're not sure if the VIHist code includes a call that is deprecated in 5.x (most likely), or if the build of 5.x happens to not have a lib that VIHist needs (but that MoL doesn't need). If all else fails, we'll put a build of mapserver 4.x on mustard, so that the instance of VIHist on that machine will work, and allow us to complete the change to the new clustered setup with new domain name for the db connections.

May 27, 2010

Seven tasks to update ViHistory

Posted by on 27 May 2010 in Activity log

1. Victoria Census 1911
Add table with all this data and integrate this with the search
interfaces for the global name search and for the cross-census search
2. Victoria Tax Assessment Rolls, 1891
Add records to existing table
Modify occupation codes (may entail modifying records in occupation
table too)
Modify street addresses to separate street number from street name
3. Business Directory 1871
Add table, integrate into search + report code for business directories (and possibly global name search)
4. Business Directory 1877
Add table, integrate into search + report code for business directories (and possibly global name search)
5. Building permits 1870 - 1917
Add table(s), search and report specific to this dataset, integrate into
global name search
6. Check census 1891
Add table, search and report specific to this dataset, possibly integrate into cross census search and/or global name search
7. 1891 census
Add records.

If I recall correctly:
we're holding off on #1 the 1911 census until the other tasks are done.

for #2, who is providing the additional 1891 Tax Assessment roll records? I'll send a separate email on the occupation codes.

JL was to provide me some datasets, is that all of #3 Business Directory 1871, #4 Business Directory 1877 and #5 Building Permits 1870 - 1917 and #6 check census 1891. If not, then which are you providing and where are the remaining datasets coming from?

The first priority is #7 1891 census add records. For that I am waiting for data from PD.

I've got a file called Victoria_1911_sample.accdb which I believe is a sample of 100 records from the 1911 census. I'm not sure if I'll be able to look at it while we're in our temporary digs, but this isn't the first priority task in any case.

normalize missing households from Johnson B 2

Posted by on 27 May 2010 in Activity log

Met with PD, who had already forwarded spread sheets on the existing census records and found records that had been missing. We looked at the database, added necessary fields to the spreadsheet and then put together a task list for populating those fields. PD will do that and get back to me.

December 23, 2009

investigate another report of php injection problem

Posted by on 23 Dec 2009 in Activity log

Got another email from PD saying he had a user reporting the problem caused by the php-injection problem had returned. Reviewed what sys-admin had done, reviewed my modifications to the code (which should have made the site impervious to such attacks) and searched for instances of injected code on pages in the site. Nothing turned up, so I concluded the user had a cached copy of the page in their browser and reported that to PD. Got an email later from him saying the user no longer reported the problem (considering I made no changes, I'm pretty sure my theory was correct).

December 14, 2009

fix bug in results navigation bar

Posted by on 14 Dec 2009 in Activity log

If user got results, then clicked on family (or location) and got results for that family, the result nav controls below the results did not correctly get the district_id value, so when the user then clicked on the next arrow to get the next 20 members of the family, instead they got results from the intended and all other districts. Problem was a misnamed variable (which of course always returned the value empty string).

December 8, 2009

harden site against php injection

Posted by on 08 Dec 2009 in Activity log

In the vihistory site, these three files had html injected into them by a hacker exploiting the fact that those pages use an argument in the GET array to specify a file which is then included into the host file.
index.php
content/census/1891/index.php
content/census/1901/census1901.php

I've added a php function which takes the page argument in the GET array and checks that against a list of acceptable values. If there is a match, then an appropriate path-and-file name is returned by the function; if not, then an empty string is returned by the function. The calling pages already test for an empty string and generate an error message, so I think I've implemented the "better approach".

The calling files I modified are:
http://vihistory.uvic.ca/content/census/1901/census1901.php
http://vihistory.uvic.ca/content/census/1891/census1891.php
http://vihistory.uvic.ca/content/census/1881/census1881.php
http://vihistory.uvic.ca/content/bd/1882/bd1882.php
http://vihistory.uvic.ca/content/bd/1892/bd1892.php
http://vihistory.uvic.ca/content/bd/1902/bd1902.php
http://vihistory.uvic.ca/content/documents/article.php
Those pages now invoke a confirm_page_ok function, which is found in a lib file on the site.

Technical details of cause of problem included in helpdesk RT #162668.

December 7, 2009

Form data ignored when user clicks GUI elements to redisplay data

Posted by on 07 Dec 2009 in Activity log

I've spent a good part of the day reviewing David's code in trying to solve the problems you've reported when:
1) user does a query and gets a report
2) user clicks on family and gets a new report (also happens when user clicks on location)
3) user clicks on "next" button
The code generating the page is not including the values from the search form. Similar problem is occuring if you click on the resort buttons at the top of each column.

searchcensus.php
/*
SA added these conditions so that if user clicks on Family or Location, then the results navbar
on the resulting generated page includes the appropriate values derived from the GET array in the
URLs of the navigation buttons
*/
if ((isset($family_number)) && (!empty($family_number))) {
$args .= "&familynumber=" . $family_number;
}
if ((isset($district_id)) && (!empty($district_id))) {
$args .= "&districtid=" . $district_ids;
}
if ((isset($district)) && (!empty($district))) {
$args .= "&district=" . urlencode($district);
}
if ((isset($subdistrict_id)) && (!empty($subdistrict_id))) {
$args .= "&subdistrictid=" . $subdistrict_id;
}
if ((isset($year_got)) && (!empty($year_got))) {
$args .= "&year=" . trim($year_got);
}

censusresults.inc:

//SA 0912007 trying to get the number of records/page to show up on generated page when user
//clicks on Family or Location
$family_url .= "&pagerows=" . $rows_page ;

November 6, 2009

add new households and people to 1891 census

Posted by on 06 Nov 2009 in Activity log

PD has found hundreds of records which are either missing or have incorrect data. He's using a spreadsheet. We've got to get those changes and additions migrated into the 1891 census database. His ss uses text values for a number of fields which in the db are integer foreign keys, so we'll have to figure out for each the appropriate fk value. It might be easier to write something that just does this globally (i.e. for all affected values in all affected fields) rather than trying to affect only the values that have changed. I'll need to examine the data more closely.

viHistory

viHistory is a web site that is a teaching, learning and research tool. It's principally about the history of Vancouver Island in British Columbia, but it is also a vehicle for exploring the larger field of Canadian history during the late 19th and early part of the 20th century. It allows census, directory and tax assessment roll data from the late 19th and early 20th centuries to be searched in many ways. It also incorporates IMaP to display historical maps. The project director is Dr. Patrick A. Dunae.

Search

XML Feeds

RSS 2.0: Posts
Atom: Posts

What is RSS?

Sidebar 2

This is the "Sidebar 2" container. You can place any widget you like in here. In the evo toolbar at the top of this page, select "Customize", then "Blog Widgets".