Category: Notes

29/08/11

Permalink 09:15:36 am, by jamie, 68 words, 126 views   English (CA)
Categories: Notes; Mins. worked: 0

New website live

Thanks to GN's work and correspondence with the systems guys, http://vihistory.ca is now live with the latest version of the site.

This is actually the development version of the site: it's being run from the 'vihdev' user and the 'vihdev' PostgreSQL database. The eventual goal is to migrate the code to taprhist and the DB to viHistory, but that's a fairly low priority at the moment.

25/08/11

Permalink 11:19:11 am, by jamie, 737 words, 117 views   English (CA)
Categories: Notes; Mins. worked: 0

State of the project upon my departure

My work on this site can be put into three different categories:

  • Adding the 1911 table and views to the database, and modifying the views for the other census years to take into account any new fields introduced in 1911
  • Adding the new 1911 census data
  • Modifying the 'loader' application to be more robust, and to accept the CSV data files
  • Making mostly minor changes to the actual website, such as textual changes and some additions to the advanced census search

My Workflow

I did all work, including importing data, on my local machine first, on which I had a complete development environment. After importing the data and testing all changes, I then updated the development server, using phppgadmin.uvic.ca for the database changes and Subversion for the PHP changes.

The Database

In addition to making the census_1911 database, I also had to make new views for the 1911 census, since the website only interacts with the views for searching and displaying, rather than the tables. I also had to modify the older views and put in some new fields (which are NULL on those older views). The search functions on the site use a UNION query to join all of the views, so they all need to have the same columns for the search to work properly, even if some of those columns are NULL in the various views. This process is all fully documented in David's excellent manual. I didn't change the methods or the way anything works; I just added the new data.

The Data

Each sub-district is in its own CSV (and XLS – Patrick usually sent me the data in both formats). Thus, each sub-district was imported separately. This is documented in full on the blog, but a short summary of the steps:

  1. Add a new row to the location table, which becomes the location_id of the rows to be imported

  2. Make a new 'mapping' file in the map directory in the loader, following the conventions of the other mapping files

  3. Add any new rows to the auxiliary tables as necessary (occupations, nationalities, etc.) - Patrick supplied these when sending the sub-districts

  4. Import the data with the loader, check for errors, delete and re-import as necessary

  5. Create a dump file and then import into the development database on tapor

The Loader

Although it had cursory support for CSV files, the loader application was really only built to handle the old Access database format, and then only on a Windows machine. So, I modified it to accept CSV files. I also beefed up the available mapping functions, and changed some of the queries to be more organized. The application isn't 100% complete, but does the job.

The loader.php script itself was a mammoth, so I created a class Loader in inc/loader.php to do some of the heavy lifting and abstract some of the functionality. David had written a short manual for the loader which explains how the mapper works. I filled out the manual with the new mapping functions that I wrote. The basic steps for importing data are:

  1. On the 'Configure' page, putting in the full path to the CSV data file in the field 'CSV File (for CSV imports) - absolute path'

  2. Choosing census_1911 for the table name on the main page and ensuring that “empty table before import” is not checked

  3. For the 'field map' file , choosing the map file made for the sub-district

  4. Cross fingers and import! There are often incorrect foreign keys which cause SQL errors, so it wasn't uncommon for me to delete the new data a few times and re-import to account for new foreign keys

The loader tends to time out and/or run out of memory when processing larger CSVs. If this happens, then it's fine just to re-import because I extended the loader with the option to skip previously entered records, which can be defined in the mapping files (and is documented in the loader doc file).

The loader does not live on the development website; all loading was done on my machine (parsnip).

The Website

Almost all of the changes I made to the website itself (i.e. the .inc pages) were textual changes given to me by PD. I did have to modify some of the search functionality, most significantly re-organizing the boxes and adding some new fields for the 1911 data. These changes were relatively minor and are documented in the blog.

23/08/11

Permalink 03:48:20 pm, by jamie, 49 words, 172 views   English (CA)
Categories: Notes; Mins. worked: 0

Dev site ready to go live

PD has given the green light to make the development site "live", replacing the current site and bringing in all of the changes that we've accomplished over the past month. I've emailed GN to contact sysadmin about this as he's already talked to them about the development site previously.

25/07/11

Permalink 12:22:02 pm, by jamie, 21 words, 135 views   English (CA)
Categories: Notes; Mins. worked: 0

1911 data complete

All of the 1911 data has now been imported. I'll be meeting with PD this week to tie up any loose ends.

04/05/11

Permalink 03:33:02 pm, by jamie, 15 words, 116 views   English (CA)
Categories: Notes; Mins. worked: 0

Added new religion and occupation codes

Received two new codes from PD in preparation of new 1911 data:

  • Language: Hindi: 6500
  • Religion: Sikh: 880

20/04/11

Permalink 10:29:54 am, by jamie, 169 words, 1182 views   English (CA)
Categories: Notes; Mins. worked: 0

How to fix PostgreSQL error "duplicate key violates unique constraint"

If you get this message when trying to insert data into a PostgreSQL database:

ERROR:  duplicate key violates unique constraint

That likely means that the primary key sequence in the table you're working with has somehow become out of sync, likely because of a mass import process (or something along those lines). Call it a "bug by design", but it seems that you have to manually reset the a primary key index after restoring from a dump file. At any rate, to see if your values are out of sync, run these two commands:

SELECT MAX(the_primary_key) FROM the_table;

SELECT nextval('the_primary_key_sequence');

If the first value is higher than the second value, your sequence is out of sync. Back up your PG database (just in case), then run thisL

SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);

That will set the sequence to the next available value that's higher than any existing primary key in the sequence.

14/04/11

Permalink 11:36:13 am, by jamie, 298 words, 118 views   English (CA)
Categories: Notes; Mins. worked: 60

Meeting with PD

Met with PD today to discuss the integration of the 1911 data thus far, and to go over any changes that need to be made. My task list from the meeting:

  • Fix the occupation and religions table searches, both of which are broken on the development and live sites
  • Add the 1911 information into the locations table:
    • location_02: id 13/"Victoria City"
    • location_03: sub district ID as given by PD (e.g. 10 for Fernwood, 3 for Rock Bay)
  • Remove the "address" field from the advanced search, which is now obsolete thanks to the "street" field
  • Put insurance data into the detail view for 1911 records
  • Change instances of "color" to "colour"
  • Remove "colour" from 1911 views since that data wasn't collected
  • In the summary view of a record, move "Speaks English/French" above "First Language" and "Second Language"
  • Insert missing first/second language information into 1911 data
  • For census record fields that have no information, display a blank field instead of "Unknown" as this glosses over the nuances of the data (i.e. no data vs. illegible or otherwise unknown data)
  • Fix the earnings field in the advanced search, which doesn't seem to work at all
  • Fix a display error in the "Birthdate" field for 1911 records
  • When choosing occupations, races, birthplaces, relationships to head, etc. in the advanced search form, only display values that have corresponding census records
  • Remove erroneous "E" relationship to head and move all records associated with it to "Employee"
  • Add "Hotel" to the building table with an ID of 9
  • Add the amount paid for education field to the 1911 detail view
I am also going to send PD a list of all the annotations for census records submitted by users over the years. He will compile a list of corrections and then we will go through them together to fix the data.

06/04/11

Permalink 12:31:00 pm, by jamie, 99 words, 115 views   English (CA)
Categories: Notes; Mins. worked: 0

Development site online

With help from Greg I've put a development site for VIHistory online at: http://tapor.uvic.ca/~vihdev/

Access is currently restricted by Netlink ID to associated parties (myself, GN, MH, SA, PD, JL). The dev site runs its own database. We put this site online so that PD and JL could "beta test" the new 1911 data once it's ready for them. This also allows me to periodically commit my development changes to the SVN repository, rather than having them sit on my machine for months, in case I get hit by a bus (or suffer another similar calamity).

05/04/11

Permalink 03:13:04 pm, by jamie, 43 words, 126 views   English (CA)
Categories: Notes; Mins. worked: 0

1871 data: not to be integrated

Contrary to my previous post about the advanced census search to-do list, PD, MH and I have agreed that integrating the 1871 data isn't prudent at this time. It's not nearly as complete as the other census years and would probably just confuse users.

04/04/11

Permalink 01:34:04 pm, by jamie, 137 words, 110 views   English (CA)
Categories: Notes; Mins. worked: 0

Changes needed to advanced census search form

PD has sent a list of changes to be done to the advanced census search form, in preparation for the 1911 data:

  • Move address from Name/Family/Location to Building
  • Move infirmities to its own group
  • Change "Race" heading to "Race/Ethnic Origin"
  • Add "1st language commonly spoken" and "2nd language commonly spoken" fields to Language
  • Change 'Building' header to 'Habitation'
  • Add "Street" to aforementioned group to search 1901 and 1911 census
  • Include 1871 as an option
  • Add new "Infirmities/Insurance" header with: Infirmities, Has Life Insurance, and Has Accident/Health Insurance
A few small shuffles, but also a few larger jobs, namely integrating the 1871 data, adding the new Infirmities/Insurance section (most of the work there is massaging the pre-1911 data to allow it to be searchable), and being able to search just by street, rather than address, for 1901 and 1911.

:: Next Page >>

viHistory

viHistory is a web site that is a teaching, learning and research tool. It's principally about the history of Vancouver Island in British Columbia, but it is also a vehicle for exploring the larger field of Canadian history during the late 19th and early part of the 20th century. It allows census, directory and tax assessment roll data from the late 19th and early 20th centuries to be searched in many ways. It also incorporates IMaP to display historical maps. The project director is Dr. Patrick A. Dunae.

Reports

Categories

May 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

XML Feeds