Log in

HCMC Journal

Monument project 2022-12-19 to 2022-12-23

to : Martin Holmes
Minutes: 605

On Monday, started work on the code which will eventually merge families once the merges have been approved. This code needs to be quite cautious and to flag up cases for further attention where necessary, so I’m going slowly with it; if I can have it working by Wednesday and we can get a list of preliminary merge ids from NA by then, I can generate a crude searchable site by Friday, but that might not actually be practical. I mapped out a pseudo-code verson

On Tuesday, started to build the comparison functions that will be needed to do the merging; I extracted the original functions from the match_families.xsl lib into a module which I can also include in the merge_families module, and as I write new functions I’m adding tests for them in the XSpec file.

On Wednesday I wrote a function and tests to compare two primaries and report in a useful fashion on the matching relationships between their children; that should provide enough information for the merge processing code to take all the actions it needs to in the main merge_families module.

On Thursday, continued with the record-merging algorithm, which is very tricky. Finished a full pass through it by lunchtime, but completely untested so not expected to work.

On Friday, continued that work until I had full tests and debugging done, at which point I began to discover errors in the original data; these led me into some more bugs in my earlier name-extraction code, which was failing to allow for multiple forenames; there are cases where children have both western and Japanese names, and therefore don’t follow the expected pattern of one forename and one surname. I fixed that bug.

At close of play on the 23rd I have a working process that can generate the names, identify potential matches, and successfully merge all the matches, with specific annotations for cases where further research is required; and the number of individual distinct names is down to less than 27,000, which is good progress.