ISE3: Getting familiar with the build processes

I've been doing some modularization of the ant build processes in the ISE3 repo, and taking the opportunity to get familiar with JT's work so far. I'm going to start work on the HTML output next, leaving the annotation/collation stuff to him.


Merge of facsimile work

MT did a lot of work encoding the facsimiles using feature structures, and I've now merged that into the repo. It was messy because of the horrible tangle of "externals" we have, which are not really external; they're just local relative links. The external pointing from data/sch to sch did not update itself automatically; I had to delete the files in that folder and svn up to get it to refresh them. Annoying.


XWiki Annotations

XWiki annotations are now embedded inline for the critical documents (i.e. documents in the crit directory) using the <note> element. Each altered file was diffed and checked for accuracy--there were a few instances of bad pointers (it seems that a few TLNs changed in some of the documents, so the entire annotation set was off by 2-3 TLNs) which I had to hand-fix.


Met with JJ and JT. Results: I've overwritten the current data/text versions of the test plays with my own generated versions, which now have glyph encoding; in the case of H5, I actually ran the glyph processing on the existing versions since JM confirms he's been editing those directly (although not since July). JT has brought back some of the collation/annotation offset-checking code from GitHub so we can integrate that. We've decided that we should not hard-convert editor-specified offsets using string matches into anchors until rendering time, because it's easier for editors to work with them if they're text-based, but we'll provide easy checking for editors.


More work on conversion

Found some more minor validity issues in the output from the conversion; dropped Titus (not ready for prime time), and worked on the remaining six until all were valid. We're now ready to look at annotations and collations.


Full integration of char work; more refinement of conversion

The character work I'd done and tested actually wasn't getting called in the build process because of a pre-existing set of templates that were converting some of the entities. Took out the old conversion templates and did a bit more work on mine, and finally H5 was correctly converting. Added in four more of the test plays, did some more tweaking and fixed some errors in the original IML, and then added in Timon and Twelfth Night to make a set of seven plays. These are now not only validating against tei_all but also against the ISE3 schema. We're making progress. Next is Titus, after which we'll move on to annotations and collations.


Character work integrated into conversion process

Ironed out the last of the bugs, and tested with the file pilot works; they threw up a couple of other bugs in the conversion, which I've also fixed. I think we're now ready to move forward.


More work on characters

There are some very weird things in the IML. I have thirty-odd weird entities left to figure out, but they're getting stranger and stranger. At some point we'll reach diminishing returns, so I'll just put something in the output that requires human intervention.


Work on characters

I've worked through most of the curly-braced entities and built on the taxonomy JT created for glyphs, splitting it into two (ligatures and single chars), just to aid in clarity; I've created equivalences in the form of choice elements in a list in a TEI file, and that can be plugged into any transformation. The idea would be to order the items by length descending, so the long ones are done first, and have a template that matches text() and runs all the equivalent replace things using analyse-string. We may have to build the analyze-string element mechanically. Alternatively, we could just run a scripted cli replace thing.


Plan for identifiers; some work on figuring out how the transformations have changed

Came up with a detailed plan for identifiers, for consideration by JT and JJ; then started looking at the stuff that's been done to the original transformation process I had running. The changes are mostly mysterious and undocumented.

