18/11/16

Permalink 04:03:55 pm, by Greg, 28 words, 87 views   English (CA)
Categories: Activity log; Mins. worked: 120

Restarting dev

I've moved development to a local VM with 24GB RAM - and it sings!
Starting to do some abstraction by creating separate schemas for app, data,edits etc.

07/10/16

Permalink 02:18:13 pm, by Greg, 57 words, 88 views   English (CA)
Categories: Activity log; Mins. worked: 300

HISCO descriptions for DB

We've found the experience of searching the HISCO site difficult. As we're relying on HISCO encoding, I've scraped the site with a PHP script, which produces an XML file (and processed it with some XSLT to transform it into SQL inserts). This is a package that contains all the scripts, as well as the XML file produced.

04/10/16

Permalink 02:17:32 pm, by Greg, 230 words, 66 views   English (CA)
Categories: Tasks; Mins. worked: 0

Custom HISCO categories

We've been using the slightly modified HISCO codes from VI History, which included levels of specificity that do not exist in the published version of HISCO.

For example, we have
1) a bunch of Managers (Cement Works, Gas Works, Lime Kiln, Mine, Powder Works, Sawmill, and Tanning Company) that *could* all be placed in a more inclusive category, like 22620 "Supervisor and General Foreman (Mining, Quarrying and Well-Drilling)"

2) 50ish posties (postman, postmaster, assistant postmaster, expressman, post office assistant) that could possibly be placed in the Mail Distribution Clerks category (37000-37090) but are listed as 35210, 35211, and 35910.

3) 8 store managers that are listed as hisco 40020,40030,40032, and 40033 but could be handled by the 21000-21990 range

4) auto salesmen/dealers that are listed as 48000, ticket sellers listed as 49090 - TO DO: what to list them as?

5) steamship pursers listed as 50040 - TO DO: what to list them as?

6) housekeeper, stewards, etc listed as 52020,52032, and 52040 - TO DO: what to list them as?

7) gentleman farmer, farm manager, and gamekeeper listed as 60040, 60030, and 60020 - TO DO: what to list them as?

8) hunters listed as 65000 - TO DO: what to list them as?

9) foreman of various descriptions listed as 70000 - 70150 - TO DO: what to list them as?

10) 1994 funky ones (X0000, X0010, X0015, X2000, X2280, X3010, Y1100, Y1105, Y1123, Y1125, Y1200) that correspond to mostly soldiers, gentlemen, cutters, and assorted oddballs (illegible, trailing ?) - TO DO: what to list them as?

Permalink 09:37:51 am, by Greg, 708 words, 97 views   English (CA)
Categories: Discussion; Mins. worked: 0

Stuff we need

Mapping of Duguid occupation codes to HISCO. Here's a quick take on it:
AG Agriculture.....................HISCO major group 6 - Agricultural, Animal Husbandry And Forestry Workers, Fishermen And Hunters
BT Building Trades.................HISCO minor group 95 - Bricklayers, Carpenters and Other Construction Workers
CM Civil and Municipal.............HISCO minor group 20 - Legislative Officials and Government Administrators
CL Clerical........................HISCO major group 3 - Clerical And Related Workers
DP Domestic and personal services..HISCO major group 5 - Service Workers
EF Engineers, Firemen..............HISCO minor group 98 - Transport Equipment Operators
FR Forestry........................HISCO minor group 63 - Forestry Workers
HF Hunting and fishing.............HISCO minor group 64 - Fishermen, Hunters and Related Workers
LA Labourers.......................HISCO minor group 99 - Labourers Not Elsewhere Classified
MA Manufacturing...................HISCO major groups 7/8/9 - Production And Related Workers
ME Mechanics.......................HISCO major group 8 - Production And Related Workers, Transport Equipment Operators And Labourers
TR Mercantile......................HISCO major group 4 - Sales Workers
MI Mining..........................HISCO minor group 71 - Miners, Quarrymen, Well Drillers and Related Workers
PE Printers, Engravers.............HISCO minor group 92 - Printers and Related Workers
PR Professional....................HISCO major groups 0/1 - Professional, Technical And Related Workers
ST Students........................?
TN Transportation..................HISCO minor group 98 - Transport Equipment Operators
OU Other...........................?
NA Unknown.........................?
UN Not Specified...................?

On this level of granularity, HISCO offers the following:
[0/1] Professional, Technical And Related Workers
[2] Administrative And Managerial Workers
[3] Clerical And Related Workers
[4] Sales Workers
[5] Service Workers
[6] Agricultural, Animal Husbandry And Forestry Workers, Fishermen And Hunters
[7/8/9] Production And Related Workers, Transport Equipment Operators And Labourers

I think the real value of the HISCO coding is at the next level down, though. For each bold heading we have minor groups that we will be able to sort on. So, we can search specifically for jewellers and find people that attested as jeweller, goldsmith, silversmith, whitesmith, etc.

Professional, Technical And Related Workers
[01] Physical Scientists and Related Technicians
[02 or 03] Architects, Engineers and Related Technicians
[04] Aircraft and Ships' Officers
[05] Life Scientists and Related Technicians
[06 or 07] Medical, Dental, Veterinary and Related Workers
[08] Statisticians, Mathematicians, Systems Analysts and Related Technicians
[09] Economists
[11] Accountants
[12] Jurists
[13] Teachers
[14] Workers in Religion
[15] Authors, Journalists and Related Writers
[16] Sculptors, Painters, Photographers and Related Creative Artists
[17] Composers and Performing Artists
[18] Athletes, Sportsmen and Related Workers
[19] Professional, Technical and Related Workers Not Elsewhere Classified

Administrative And Managerial Workers
[20] Legislative Officials and Government Administrators
[21] Managers
[22] Supervisors, Foremen and Inspectors

Clerical And Related Workers
[30] Clerical and Related Workers, Specialisation Unknown
[31] Government Executive Officials
[32] Stenographers, Typists and Card‑ and Tape‑Punching Machine Operators
[33] Bookkeepers, Cashiers and Related Workers
[34] Computing Machine Operators
[36] Transport Conductors
[37] Mail Distribution Clerks
[38] Telephone and Telegraph Operators
[39] Clerical and Related Workers Not Elsewhere Classified

Sales Workers
[41] Working Proprietors (Wholesale and Retail Trade)
[42] Buyers
[43] Technical Salesmen, Commercial Travellers and Manufacturers Agents
[44] Insurance Real Estate, Securities and Business Services Salesmen and Auctioneers
[45] Sales Workers Not Elsewhere Classified

Service Workers
[51] Working Proprietors (Catering, Lodging and Leisure Services)
[53] Cooks, Waiters, Bartenders and Related Workers
[54] Maids and Related Housekeeping Service Workers Not Elsewhere Classified
[55] Building Caretakers, Charworkers, Cleaners and Related Workers
[56] Launderers, Dry-Cleaners and Pressers
[57] Hairdressers, Barbers, Beauticians and Related Workers
[58] Protective Service Workers
[59] Service Workers Not Elsewhere Classified

Agricultural, Animal Husbandry And Forestry Workers, Fishermen And Hunters
[61] Farmers
[62] Agricultural and Animal Husbandry Workers
[63] Forestry Workers
[64] Fishermen, Hunters and Related Workers

Production And Related Workers, Transport Equipment Operators And Labourers
[71] Miners, Quarrymen, Well Drillers and Related Workers
[72] Metal Processors
[73] Wood Preparation Workers and Paper Makers
[74] Chemical Processors and Related Workers
[75] Spinners, Weavers, Knitters, Dyers and Related Workers
[76] Tanners, Fellmongers and Pelt Dressers
[77] Food and Beverage Processors
[78] Tobacco Preparers and Tobacco Product Makers
[79] Tailors, Dressmakers, Sewers, Upholsterers and Related Workers
[80] Shoemakers and Leather Goods Makers
[81] Cabinetmakers and Related Woodworkers
[82] Stone Cutters and Carvers
[83] Blacksmiths, Toolmakers and Machine Tool Operators
[84] Machinery Fitters, Machine Assemblers and Precision-Instrument Makers (except Electrical)
[85] Electrical Fitters and Related Electrical and Electronics Workers
[86] Broadcasting Station and Sound Equipment Operators and Cinema Projectionists
[87] Plumbers, Welders, Sheet Metal and Structural Metal Preparers and Erectors
[88] Jewellery and Precious Metal Workers
[89] Glass Formers, Potters and Related Workers
[90] Rubber and Plastics Product Makers
[91] Paper and Paperboard Products Makers
[92] Printers and Related Workers
[93] Painters
[94] Production and Related Workers Not Elsewhere Classified
[95] Bricklayers, Carpenters and Other Construction Workers
[96] Stationary Engine and Related Equipment Operators
[97] Material Handling and Related Equipment Operators, Dockers and Freight Handlers
[98] Transport Equipment Operators
[99] Labourers Not Elsewhere Classified

27/09/16

Permalink 02:40:52 pm, by Greg, 54 words, 50 views   English (CA)
Categories: Tasks; Mins. worked: 0

Importing more data

Importing the CGWP records that have matching CVWM rows produces about 10,600 results.
Of those, 173 seem to be duplicates.
We need someone to go through them and weed out dupes.
This is the sql to get those records:

select * from transfer ou
where (select count(*) from transfer inr
where inr.cwgc_id = ou.cwgc_id) > 1

21/09/16

Permalink 12:57:48 pm, by mholmes, 113 words, 76 views   English (CA)
Categories: Activity log; Mins. worked: 240

Matching CVWM data with existing records

The CVWM data overlaps to some degree with LAC and CGWP data, but there are likely many records in CWVM (some with corollaries in CGWP) that are not yet in the current database because either they don't have a match in CGWP, or their CGWP match doesn't have an LAC id. I've written and tested this morning a comparison tool which detects all these records and outputs details of them in such a way that the obviously-needed ones can be immediately identified and their data automatically output into the current db, while more problematic matches can be inspected by a human. It's running now, and is expected to take a couple more hours.

16/09/16

Permalink 09:54:59 am, by Greg, 69 words, 51 views   English (CA)
Categories: Activity log; Mins. worked: 60

Data mangling

When I imported the data I seem to have run strings through an escape routine twice, producing results like Jack O''Connor (note the double single quote. Replacing them was a bit of a hassle, but the logic is pretty simple:
UPDATE person SET surname = regexp_replace(surname, '''''', '''', 'g') WHERE surname like '%''''%';

I then had to refresh the materialized view:
REFRESH MATERIALIZED VIEW person_mv;

15/09/16

Permalink 10:42:08 am, by Greg, 220 words, 42 views   English (CA)
Categories: Activity log; Mins. worked: 20

Letters, newspaper mentions, etc.

Lots of individual records are linked to letters, newspaper articles, etc. through the soldierlinks table.
For e.g. Dick Irvin <http://canadiangreatwarproject.com/searches/soldierDetail.asp?ID=70961> is linked to this letter <http://canadiangreatwarproject.com/transcripts/transcriptDisplay.asp?Type=L&Id=229>. The transcripts are found in the generaltext table.

The logic seems to be: when loading an individual's record, look in the soldierlinks table for the soldierID and retrieve the linkIDs associated with it. Go get that linkID from the transcriptions table, which contains components for the HTML output. When users click the link (in the linkAuthor field) they see the letter or w.h.y., from the generaltext table. The generaltext table entries have an indexVal field which conforms to the overall letter. This may be made up of n entries (as below) in the generaltext table.

The letter (above) is actually three entries (delimited by the horizontal rule in the HTML output). The first is entryType LTRP, which I take to be a preamble. The second bit is an entryType LTRM, which I will call the main letter. The last bit is an LTRS entryType (SIC, maybe?) which appears to be a correction of portions of the letter.

The soldierlinks table has several link types: LTR (letter), WD (war diary), and NWS (newspaper).

14/09/16

Permalink 02:23:08 pm, by Greg, 143 words, 39 views   English (CA)
Categories: Tasks; Mins. worked: 0

Things to do

  1. Incorporate other data - first pass done (added 7800)
  2. Incorporate images in to main soldier display - done
  3. Incorporate awards in to main soldier display - done
  4. Add ranks-held listing in details page - done
  5. Drill-down searches for POE, POB, Cemetery, Awards,Memorials, Regiment/Battalion, etc.
  6. Prose for main page, about
  7. Placeholder text for the rest
  8. Search help for search page
  9. Future plans/features for search page, war diaries page, stats page, etc. (some of the static content can be put here, like the stats on http://canadiangreatwarproject.com/writing/cefStats.asp)
  10. New top level page(?) for 'essays' - for e.g. the 'Canada in the War' section of existing site
  11. New top level page for 'Collections' - to contain 'From the Front' materials
  12. Widgets: famous Canadians, soldiers died on this date/today in history
  13. Convert date_of_death to date field - done

11/08/16

Permalink 02:10:30 pm, by Greg, 91 words, 54 views   English (CA)
Categories: Activity log; Mins. worked: 60

Date of birth

We have quite a few people that declared a date of birth that doesn't meet postgresql's notion of a date and I coped with this by creating a date_of_birth field that contains an array of valid dates, and another field that contains an array of strings (the 'bad' dates).

I noticed that quite a few of them could be fixed, so I edited 115 of the easiest ones to check. There are still ~575 records with malformed birthdates, but quite a few of those will be easy to fix as well.

<< Previous Page :: Next Page >>

CGWP

Reports

XML Feeds