Progress with the Monument project
to : Martin Holmes
Minutes: 200
Finally figured out the issues with the regular expression that were plaguing me, and was able to go on with processing the parsed-out data. I now have a 33MB file generated with over 64,000 individual person records, which of course includes a lot of overlap that will have to be fixed with disambiguation. There’s a lot more I can do in terms of identifying correspondences and likely duplicates, and even merging families, but I’m waiting for guidance on how the project would like to proceed with this dataset.
To encode the relationships, I had to add <listRelation>
and <relation>
to the schema, but the schema build process
was broken because paths have changed on the TEI Jenkins machine, so I had to
debug and fix that. Then the diagnostics broke because they were attempting
to process the new <person>
elements, so I’ve fixed that.