Progress with OAI code
I've re-focused on the task of generating and storing the OAI records in the database, in such a way that they can be updated easily whenever the db contents change. I've written a library called oai_update.xq, which has the original record-generating code from my first attempt, but massaged a bit so that it uses explicit namespace prefixes for TEI; this is necessary because we need to generate the record fragments in no namespace, so it's easier if we don't have a default one. I also fixed a couple of bugs which emerged when I tested my code on the whole 7000+ documents. This is what it does:
- For each record in the correspondence collection, it checks whether there's an OAI record.
- If there isn't, it generates one.
- If there is, it compares the modified date on the OAI record against that of the original correspondence record, and if the former is older, it deletes it and generates a new one.
This is what it's not yet doing:
- Removing OAI documents for any correspondence documents that no longer exist (occasionally we remove a document when we find a duplicate). This will be fairly easy to do.
As I write this, I'm generating a set of OAI records for the whole up-to-date collection on my local copy of the machine. In the new year, I should be able to dump those and upload them into the live db to pre-populate it. Then I can add the feature above, and then write sitemap pipelines for the operations and add them to my set of periodic update operation tasks. Finally, I can then finish the OAI interface, which should be much simpler, since it'susing existing records instead of querying source data and constructing records.
Reminder to self: the OAI docs are here.