Log in

HCMC Journal

Progress with the Monument project

to : Martin Holmes
Minutes: 200

Finally figured out the issues with the regular expression that were plaguing me, and was able to go on with processing the parsed-out data. I now have a 33MB file generated with over 64,000 individual person records, which of course includes a lot of overlap that will have to be fixed with disambiguation. There’s a lot more I can do in terms of identifying correspondences and likely duplicates, and even merging families, but I’m waiting for guidance on how the project would like to proceed with this dataset.

To encode the relationships, I had to add <listRelation> and <relation> to the schema, but the schema build process was broken because paths have changed on the TEI Jenkins machine, so I had to debug and fix that. Then the diagnostics broke because they were attempting to process the new <person> elements, so I’ve fixed that.