Log in

HCMC Journal

Monument: Starting work on PDF checker

: Martin Holmes
Minutes: 365

The first part of the next phase of the Monument work is to create an automated checker that compares the PDF that will be sent to the stone creators with the source namelist we generate, to confirm everyone is there, in the right place, and no-one who shouldn’t be there is there. I started work on this today, writing code to extract the names and places from the PDF itself, and turn them into a well-structured XML file which is easy to use in comparison processes, and then extracting a parallel name list from the Monument dataset. By the end of the day, I got it to a point where all the comparisons were being made, but due to an anomaly in the way the PDF is formatted, it finds one name missing on either side. More work needed there, but as long as they don’t change the PDF format, this codebase should be sufficient.