Began work on building name indexes. First I split out the Streets tables into separate files, because the single file is much too big. Next I figured out how to assign unique ids to personal names in the HTML output, while incorporating info about the source document. Finally, I've built a very simple index of linked names from the HTML output, linking to the instances of names in the HTML documents. This is not quite working yet, but it's well on its way.
Dates were missing due to bug; started some minor prettying-up.
I've basically finished all but the cosmetics of creating usable readable output from the fishing boat ledger spreadsheet; there are now XML and HTML documents for each of the spreadsheet rows, and a tabular sortable index, and the original page-images are linked. Lots more work to do, now, looking at linking similar names together.
Met with AC and SA about the fishing boat ledger data. This is currently in an Excel spreadsheet, which is OK, but we want to make it more accessible, so I've now added a bunch of processing to the build which:
- Uses headless libreoffice to turn it into a FODS file (I had to install libreoffice-common, libreoffice-writer and libreoffice-calc on the server).
- Cleans up and expands the FODS file a bit.
- Generates an XML document from each record/row.
- Generates and HTML doc from each record.
Tweaked the XML CSS to make the XML readable; the HTML is very rudimentary and needs work. After that, I'll generate a bunch of indexes of various kinds (by name, by vessel, by date, by amount, etc.) or maybe a single table that contains columns for these, which is sortable.
Had some discussion with HR about how file and folder permissions need to be handled; she's still working on that on nfs, while we get the netlinks for the other folks that will enable us to use ACLS properly. Meanwhile, a lot more audio has been generated, so I ran my script to generate MP3s, and discovered a bug in it (ffmpeg can't create a file where the containing folder doesn't yet exist). Fixed that, ran the script again, and many MP3s were created and uploaded to the server as intended.
home1t ran out of space overnight due to the scale of the Zotero backups. I think we'll be relying on Zotero for at least a couple of years yet, so we can predict that we'll need 2.5 TB for that alone; meanwhile, the oral histories cluster are generating audio and video. So GN has put in a request for a new filesystem, and we're investigating the possibility of ACLs being available on it so that we could better support the needs for access by the large LOI project group, while allowing smaller subgroups to read-write access. Meanwhile, the OH group is uploading stuff into AtoM on an experimental basis.
Detailed discussions with HR around file and folder permissions, and file naming, leading to a Python script which does some fancy footwork to maintain a mirror folder on loi/www which mirrors all the wav files in oralhistory into mp3 files on the server; the latter controlled by .htaccess, the former by files system permissions. This is a bit ugly and not ideal -- lots of rsyncing going on, since I can't actually generate mp3s on the server -- but it works for now.
At SF's request, a spreadsheet of owners is now generated as part of the build.
Worked out the Land Titles diagnostic routines, implemented and tested, and rewrote the build file to include them, and to do a better job of archiving artifacts. More tests will probably be added.
Spent most of the day on the Landscapes directories data, which was a bit disorganized:
- Normalized filenames, titles and xml:ids in the directories files.
- Fixed hundreds of fairly trivial errors in the XML encoding. Many should be catchable with Schematron; we should look at that.
- Extended our schema so it will support other files that were originally marked up with tei_all.
- Wrote HTML output XSLT for all the directories files, and merged it into the build process.
Heard back from the Land Titles researchers with a list of diagnostics they'd like to see, so I'll write that tomorrow.