Log in

HCMC Journal

Monument 2023-12-04 to 2023-12-08

to : Martin Holmes
Minutes: 1280

On Monday, got three new spreadsheets of bilingual names, merge/deletes, and born-post-adds. Fixed some duplicates in them and merged all of them. We’re still well above 26,000 people, because the new-borns made up for most of the merges, but there are still 1,684 people with matching names. There are over 3,000 people born after uprooting. 1,318 people are excluded, so the total name count right now stands at 25,212. Also started planning possible presentation for DH2024.

On Tuesday, got the first list of disambiguations; there were 1595, and they were all integrated into the collection without any problems using the XSLT I wrote the day before. Then added two new diagnostics and improved the stats a little. We seem to be down to a handful of remaining problem cases.

I also started work on a draft of a submission for DH next year, which is now about half done.

On Wednesday, got new batches of updates from SI and processed the spreadsheet ones, then made the other dozen or so changes manually. Most diagnostics are clear at this point; only 24 name-collisions awaiting disambiguation. Then found some weird cases where people were their own parents, so there’s now a diagnostic to catch those and we’ll fix them in due course; then did a bunch of manual fixes to surnames and ids per AB.

On Thursday, fixed some schema issues and then worked on a small batch of very problematic merge/deletes which raised questions for discussion; couldn’t finish those by the end of the day.

Friday was all-day Monument work. First, worked through the remaining handful of merges from yesterday. Then I manually implemented the first batch of fixes for the multiple-parent problem. I revised the diagnostic for this to make it more sensitive, so it not only catches people with more than two parents, but also cases of more than one mother or more than one father, and also includes adopted children. Now we are down to 95 multiple-parent issues, and SI will work on those over the weekend; I’ll try to implement them as they come in, so we keep the momentum going. At AB’s request, I published an update to the public site. This is the first to include the CSS changes from my vacation week.