This is a requirement for the JSON search engine, which has now become the main priority for the website. I'm implementing Porter2 as described here, with some guidance from the JS implementation by Kristopolous, although I'm already finding that XSLT will require a different approach in many ways. I'm building a unit test in XSpec as I go. I'm also learning from JT's implementation of the original Porter, although they are significantly different. 120 minutes.
Did a lot more bugfixing, and tested with the newly-ready Cornhill 1860; I think the script is now robust. Also we did an intro-to-XPath workshop at this morning's meeting. 180 minutes.
Found some bugs in my sql-to-tei code, specifically in the script which writes scripts to add and move XML files following the merge of data. I've now fixed and tested the code and it seems to be OK; the fallout took a couple of hours to fix, though, since a handful of files across a number of directories had been duplicated (new file with different filename not added to repo; old file not moved to obsolete). Fixed all of these carefully, and dealt with a conflict suffered by KSHF caused by this.
Also doubled the size of a db field per AC.
Added two new fields, being a VIAF id field which turns into a link to the VIAF record, and a hashtag field for the persons table. The latter required a rewrite of the diagnostics, and some testing. 120 minutes.
Our weekly meeting analysed and discussed some interesting edge case linked poems. I have new TODOs from the meeting. 150 minutes.
We discovered some broken images on the file server; it turned out they were TIFFs, but named *.jpg. I wrote this quick script to check all the images on the server. It found 44, which we'll replace when we have the chance.
#!/bin/bash echo "Checking all images to see if they are really images." > temp.txt find . -iname "*.jpg" -type f -print0 | while IFS= read -r -d $'\0' file; do INFO=`file "$file"` if ! [[ $INFO =~ 'JPEG' ]]; then echo "$file" >> temp.txt fi done let lines=`wc -l < temp.txt` echo "List of images which are not real JPEGs." > falseJpegs.txt echo "----------------------------------------" >> falseJpegs.txt echo "" >> falseJpegs.txt sort temp.txt >> falseJpegs.txt echo "Found $lines broken images, listed in falseJpegs.txt."
The Jenkins build broke because some of the new files being processed have keywords in them that trigger the build failure. Fixed this by editing the log parse rules. 20 minutes.
Following discussion with AC, added a new diagnostic to catch cases where pseudonyms have possibly been omitted from the db record. Also added a complete listing page for all historical people; added more info to their individual person pages; and switched on rendering of non-transcribed poems, since these pages all link to them. Need to do more work on making those pages look more useful, but it's a start. Also discussed creating a file full of linkGrp elements, as MoEML has. 240 hours.
A number of changes arising out of discussions yesterday and today:
Documentation and rendering for choice/sic/corr implemented and tested.
Link to XML on poem page rendering (needs prettying up).
Discussion on how to encode elision initiated on TEI-L.
Fixes to rendering to handle previously-unexpected scenarios in which lgs appear within notes and/or epigraphs, but should not be processed or counted as lines in the main poem.
I've been working on generating JSON files in support of the search, mostly with the aim of supporting the search filters, as we do in other projects like Keats. The work is going well, but what I'm not really clear about yet is where to draw the line between convenience and file size; for example, I can include details of the periodical title and folder path with every poem retrieved under any category, or I could include only an id and have a small JSON file for periodical information which I use to look that info up when needed. Still thinking about all of this.