Refined parallel approach

December 13th, 2018

While the massively-parallel approach appeared to be working on my desktop when run overnight, some processes did seem to have crashed near the beginning; on GN's machine the whole thing died within minutes. I've now rewritten it so that only four (configurable param) processes are run in parallel at the same time, to hopefully reduce the load on the machine. We're now testing that.

Parallel approach to generating match data

December 12th, 2018

Since the existing process was taking a very long time (1652 minutes for 25,600 records), I've devised a revised approach which allows for parallelization of the process. Basically, the driver ant file runs the XSLT with a special parameter that causes the XSLT to write a new temporary ant file as a parallel driver; then the first ant file calls the temporary one, and several processes are kicked off simultaneously. We're both running this overnight to see how fast it goes. There's obviously a lot of tuning we could do in terms of the task division, so we'll definitely be coming back to this, but since the default process will take up to two weeks, cutting it down is essential.

Back to processing incoming data

December 11th, 2018

With GN, examined our original code for importing the two datasets, and started a revamp/rewrite of it, managed by ant. Currently running a full similarity metric test against the latest CGWP version. Will take days. May be able to split and parallelize it.

April 3rd

April 3rd, 2018

Still working through the Ontario locations.
Today i found a person with two entries with different LACID's.
Joseph Harold Code (pid:834415) - LACID 7900 (which is actually ANDERSON, ARNOLD ALBERT's LACID)
Joseph Harold Code (pid:934457) - second entry has the LACID 107900 which is correct.

March 11 - 15

March 15th, 2018

James Albert Thompson seems to have two entries.
Have been working on Ontario. Down to ~3,100.

Feb 26 - 28

February 28th, 2018

Manitoba is pretty much finished, with 23 entries remaining.

Made a mistake: The match between Victoria Man. and Holland river Ont. is a mistake and should be removed.
The Appropriate match is with the Victoria Rural Municipality Man.

February 28th 2018

February 28th, 2018
Worked on Quebec, down to 727 places to match

27 February 2018

February 27th, 2018
-Continued work on Quebec, on the 'M's, 773 left. - Wilfred Meagher file has incorrect province for Glengarry, should be Ontario. It is corrected on the document but not CGWP. - Private John Belt file says Granville, QU should be transcribed as, Georgeville, QU. - Private Frederick Emerson Sunstrum file incorrectly transcribed as Guyon,QU. Should be Quyon, QU. - Private Omer Sevigny file, POB,should be transcribed as Ham-Nord QU, Not Hemmond?, QU. - Léger Turcotte file says birthplace is Jeune Lorette, QU. Should be transcribed as Loretteville, QU. - Private Ernest David P.O.B should be transcribed as Joseph Farm, Maniwaki, QU. Rather than Joeseph, QU. - Oliva Lanouette file POB should be transcribed Sainte-Anne-de-la-Pérade, QU. Rather than La Prade, Champlain County, Quebec. - Private Ernest Tremblay file POB should be transcribed as, Lac-Cayamant, Quebec. Rather than Longeault, Quebec

26 February 2018

February 26th, 2018
Worked on Quebec Locations up to the end of 'F' with 12 unknowns. John Angus McDonald (Quebec Born, Died 1916) has one page of records that belong to a different John Angus McDonald who lived through the war. (End of file)

Feb 20th

February 20th, 2018