Category: "Activity log"

Meeting with JT: plans for ODD processing

March 8th, 2019

Met with JT to thrash out plans for processing ODDs.

Mtg w JJ and new build for TCP

March 4th, 2019

Met with JJ to discuss priorities, which was very helpful. She laid out some ideas for documentation and structure; she's going to get working on some texts to help flesh out some real life encoding.

To do so, she needs some stuff from the TCP, which we've done before. Since we've needed to do this a few times, I've decided to automate the process slightly:

  • First, I downloaded the JSON version of the TCP catalogue, and converted it to a much smaller XML file, deleting fields that we don't need
  • Then, wrote another XSLT that resolves an STC number to a TCP number, if available, retrieves that TCP document and then passes it through the (existing) set of templates for converting TCP to LEMDO


February 20th, 2019
Lots of meetings with basically everyone while I was in town.

TEI Server workaround

February 10th, 2019

I needed to regenerate the schema, but the TEI outage was causing issues. A simple fix for this was simply to switch the defaultTEIServer parameter and the defaultTEIVersion parameter in the XSLT. I've now codified this in the ANT as a set of conditions, so that, by default, we use the TEI server, but if that's down for whatever reason, then use the TEI Jenkins.


February 9th, 2019

Extensive mtg with MH about LEMDO; mostly just catching up and came up with a plan for various copies (for lack of a better term) in collations, and sent an email about that. Also started to work through the issue 328 on Github, but ran into some issues with the git pull stuff (which I'll figure out myself)

Then, met with JM twice, to work through encoding challenges and to resolve some issues; fixed some of the code template stuff in the process and he updated his oXygen (he was running on 16). Worked through some encoding problems and some errors, including some pretty printing problems; we really should see if we ca disable that by default in a project.


February 6th, 2019

JM sent a bundle of files for KNDW edition--including a selection of annotated primary sources, an OS text, a modern text, and an essay--for conversion, which I've now done. It was an interesting experiment: most of these documents were in older forms of IML, which we didn't necessarily know how to handle. So I've modified the IML conversion slightly to allow for an AB element, so we can fix hierarchy issues more easily.

odd2lite work

February 5th, 2019

Over email, discuss Stylesheet work with MH; he has created a branch and we're going to do a dive into the odd2lite stuff. This will be particularly useful for LEMDO, but will be broadly useful for any of the Endings projects

Mtg w JM

February 5th, 2019

Biweekly meeting with JM and worked through a number of issues. Came up with plan for the next two weeks

  • JM's SVN access isn't working at the moment, so he's getting me to merge his Ado_M [DONE]
  • Rom_Q2M was out of date; reconvert [DONE]
  • Convert Err for editors and add linebreaks to the OS as test case for bare display
  • Start thinking about metadata display (discuss with MH and PS)
  • Add schematron to use em-dash instead of double dash [DONE]

Conversion and schema process

January 28th, 2019

Added some new schematron for common problems that arise after conversion that have to be fixed by hand, including when there is embedded verse in prose (which could be done programatically but is safer to do by hand to confirm lineation, etc) and other common errors (unescaped angle brackets et cetera). The idea here is that if the texts pass the basic TEI conversion (which they basically all do), the converter must then check the text and ensure that they are valid against the LEMDO schema; if not, then the texts must be modified slightly by hand to fix things that require human intervention. It's a bit more involved, but it stops absent minded conversion (which I am guilty of).

Fixed a ton of small issues (mostly embedded verse, but some other small things) and converted Mac and 1H4 as per JM.

Mtg w JM and acting on action items

January 21st, 2019

Long meeting with JM to discuss progress; discussed plans and priorities going forward. He also asked for a better display of overlapping annotations, which I agreed with; he wants everything that crosses multiple lines to be displayed as a line on the side rather than multiple types of underline. This is, of course, a rendering issue primarily, but we do need to get the algorithm in place to get the annotations and collations embedded in the HTML so that rendering decisions are easily handled later.

So, reworked the code so that if something goes over two lines, the behavior is slightly different. One way to resolve this would have been to create container divs, but decided against that since it's a lot of manipulation of the HTML; instead, we'll just add an onclick() event to the div itself and a left-border, and make sure the events don't propogate. Seems to be working alright; more work will likely need to be done, but it's mainly aesthetic at this point

Major rework of annotation matching code

January 17th, 2019

I realized that the annotation matching code was a bit too generous with its selection of nodes, due to the preceding:: and following:: axis. I've now reworked it into something much more efficient and cleaner. The problem is that there are a number of different ways in which an editor might select a bit to annotation; the best way would be to point directly to an element (say an l or an lg), but most often editors want to gloss particular words or phrases using an anchor, so we have to handle both (and both in combination: say something starts at a line but ends halfway through another). So the solution to this is to first generate all the nodes that are between the two boundary points (using the incredibly fast << and >> selectors, which I've just learnt about) and then group them based off of ancestor nodes that were in that original selection of nodes. (That's not a very elegant way to describe it, but it's all documented heavily in the XSLT.)

This new process shored up a number of issues with the annotations. Many of them are aesthetic, some are functional, and some are processual. We'll have to have a set of diagnostics to check whether or not something that was annotated with anchors could be better annotated by pointing to a line or a line group. That won't be trivial, so that'll take some time

H5 conversion

January 15th, 2019

After corresponding with JM, determined list of priorities for the conversion. I decided to do H5 first since it is one of the most densely annotated/collated and the editorial process should be swift.

Converted H5_FM and Q1M and got annotations working; there were a few bugs in the HTML code, which I fixed. Also discovered the issue with part lines; the IML usually flagged the part lines with an empty line preceding the inferred line (i.e. via one of the LNs), so re-wrote bits of the conversion to handle that. Also had to make various fixes to the apparatus linking code to handle supplied tags.

Mtgs and conversion

January 8th, 2019
Mtg with JM to catch up and discuss the current state of the project. Returned to the conversion code and had to fix a number of problems with iembeds and ilinks, which will require some policy decisions going forward.

Annotation highlighting

December 4th, 2018
Annotation highlighting is more-or-less working now. Most of the work is done in an XSLT module; it will probably need to be modified to handle collations and will need some serious checking to ensure that it works for all the various cases, but, as far as I can tell, it's working well. Ran into some issues with data-* attributes and case sensitivity, but should be resolved.

Meeting with JT

November 30th, 2018

Discussed plans and evolved a new idea for an ODD for ODD/Documentation with its own processing, to be part of the Endings project. More coming on this.

Between two anchors

November 7th, 2018

I've been trying to figure out the best way to retrieve all nodes between two anchors; I'm not changing the document hierarchy (i.e. I'm not tagging them), but I want them for labelling purposes etc. But doing following:: and preceding:: or using the following/preceding operators (>>) operators with intersect isn't very efficient.

It's nice because we have @xml:ids on these things, so it's easy enough to know when to stop. It's not generalized for non-id'd things (although, I bet you could use generate-id(.)).

<xsl:function name="hcmc:betweenTwoAnchors" as="node()*"> <xsl:param name="anchor1" as="node()"/> <xsl:param name="anchor2" as="element(anchor)"/> <xsl:variable name="f1" select="$anchor1/following::node()[1]" as="node()"/> <xsl:variable name="f2" select="$f1/following::node()[1]" as="node()?"/> <xsl:copy-of select="$f1"/> <xsl:choose> <xsl:when test="($f2/self::anchor and $f2/@xml:id = $anchor2/@xml:id) or empty($f2)"/> <xsl:otherwise> <xsl:copy-of select="hcmc:betweenTwoAnchors($f1,$anchor2)"/> </xsl:otherwise> </xsl:choose> </xsl:function> <
Basically, it works by taking in two boundary points, using the left one and stepping to the next node; if the node immediately following that one is the anchor tag, then exit; if it's not, run the thing again, but using the next node ($f1). This is all actually documented in the LEMDO code, but thought I should put it here in case.

ODD, Schematron, and XSLT

September 12th, 2018

Working on the ODD to allow teiCorpus (which was trivial), but then spent a long time trying to get a schematron rule in place to check the attributes of a processing-instruction. Note that XSL variables do work in schematron patterns, but functions and other bits (at least with the current stylesheets) do not.

The second issue was that the schematron building code (that I lifted from MH's Keats code) does not automatically process "foreign elements" (i.e. XSLT variables) in the final XSLT step (iso-to-svrl-2.xsl). You have to add an "allow-foreign=true" parameter to the XSLT for it to work as expected.

Adding egXML and CSS validation

August 30th, 2018

Following MoEML, I added the egXML and CSS validation steps to the buildValidate.xml build. Had to modify it slightly, since MoEML often uses collections whereas the builds currently rely more on filesets. It simplified things slightly.

Had to deal with quite a bit of fallout, since our egXMLs were either invalid or our schema wasn't handling things properly.

More meetings

August 15th, 2018

Phone and personal meetings with three people re the latest developments; slight reshuffling of plans.

Meetings and planning

August 14th, 2018

Meetings today with MC, JJ and JT, with lots of planning and the writing of a sitrep to pass to MB.


July 12th, 2018

Met with JJ and MH to discuss plans moving forward, particularly documentation and plans. We're going to make sure our documentation is easily navigable from the outset, with maps and decision trees giving multiple pathways to get to a particular bit of documentation.

Also made some headway on the build processes themselves; I've been able to modularize the builds so that the master build (lemdo/build.xml) calls a number of other builds (all with suffix '_master.xml') that themselves call either XSLTs or other modular build processes. Each of the smaller master builds can be run on its own (it also includes the globals module) or they can be run from the master one

XWIki conversion

July 11th, 2018

Working on the XWiki conversion; oddly, the converter tool that XWiki uses to export the texts is inconsistent. If you use the "texts" space, it gives you a fragment with a content element (in no namespace), if you don't, it gives you the full XHTML page in the XHTML namespace. So if you want to download all of the texts associated with a text (i.e. all the things in edition 1HW), then you use the text space and convert the content element; if you want to get something like a paratext page, then you have to convert the body in the XHTML namespace.

I now have a conversion working that first turns the former situation into the latter and then converts it all the same way; I also worked a lot on the linking, file naming, and other parts the HTML conversion. The conversion looks good and gives us an excellent starting point; there will need to be lots of edits to the pages, but otherwise, they are good. We can cut off XWiki anytime now.

Eliminating rend

July 10th, 2018

Spent some time implementing what MDH, JJ, and I discussed regarding @rendition and @rend. We won't use @rend and we will always use @rendition or @style; we would have our own prefix ('rnd'), which are initially based on simplePrint.

The next challenge will be a way to say: either choose from this list OR use one defined within the file. I don't think you can do that in Pure ODD, so I'll have to figure out some schematron to constrain that.

DRE site

July 5th, 2018
Continuing to Endings-fy the DRE site; I'm not aiming for a perfect reproduction since the DRE team + PS have the design know-how, but I am trying to get most of the structural elements in place so that it approximates the original. Finding lots of errors and fixing them as I go.

DRE sitepage framework

July 4th, 2018

Lots of work on the DRE framework. I have a very quick and dirty imitation of the current site (but using grid, so it flexes nicely); it's not perfect, but it's helpful for visualizing the content. Made a test build to see how the original XML looks when moving in the HTML and it's all looking fine.

DRE Framework

July 3rd, 2018

Started thinking seriously about the site page framework. MDH created a blank HTML site page, which has all the semantic bits that have to be filled in. I like this idea a lot, especially if we combine it with a flex+grid approach in CSS. Did some reading up on that, played around with grids, and then converted some of the About pages into TEI.


June 25th, 2018
Continuing work on the conversion; spent some time clicking around to do a bit of an inventory of pages. Everything looks fairly straightforward. I also spent some time looking at the facsimiles and trying to come up with a good solution for how to convert them.

Supplementary documents

May 28th, 2018

Mostly working on supplementary documents and corresponding with JL and CW. These documents are coming along nicely.

More DRE conversion

May 25th, 2018

Converted 3LL, 3L3LL, OWT, and TTR3 from DRE. These were fairly straightforward and look fine in the output. The rest of the QME texts have specific issues, which I've emailed JJ about.

DRE text conversion

May 24th, 2018

Converted FV, FBFB, and Leir from the QME site; the conversion seems to be working well still, but every text throws a new wrench. These look good, though, but they are the already published ones; the next ones might be a bit trickier.


May 14th, 2018
Met with JL, AH, and JJ to discuss next steps for LEMDO. Lots of great questions and discussion, which JL and AH will document. That documentation will need to be turned into TEI somehow, but we haven't discussed how to do it just yet; they're currently doing it all in Google docs and creating an index in Google Docs.


May 9th, 2018

Lots and lots of work on the conversion code and converting the DRE texts. Tried three (1HW, 2HW, AHDM) and they only required a few small tweaks to get them working. The annotations and XWiki documents are also working, which was a nice surprise. The code had to be tweaked a fair bit and it certainly requires some better documentation, but it's coming along

Also created a new build in conversion/buildEverything.xml. This code calls the three other builds (buildSgml, buildApparatus, buildXWiki) so that we can build a number of documents fully. There is now a parameter for works, which is a list of works (AYL, H5, MND) and the build iterates over the list, doing the full conversion for each work. Worked well for DRE. (To note: each individual build works by itself and does not depend on buildEverything in any way.)

ANT Java dialogs appearing behind oXygen fix

May 8th, 2018

Pop up dialog boxes created from embedded Javascript in Ant were appearing behind oXygen, which wasn't very user friendly. I posted a ticket on the Oxygen forum ( and the solution suggested worked. Basically, I had to create an empty JFrame and then place the other dialog boxes on top of them. It slows down the process slightly, but I'd rather sacrifice speed for usability. Code below posted for future reference:

//Create a new Jframe
var frame = new javax.swing.JFrame();

//Set it on top
frame.setAlwaysOnTop( true );

//Set location
frame.setLocationByPlatform( true );

//Close it when it closes

//No decoration

//Display it

//Now get the inputter for the message
var inputter = new javax.swing.JOptionPane();

//Put the dialog there and show the message with the frame as its parent.
var message = javax.swing.JOptionPane.showInputDialog(frame, "Please enter a short message summarizing what you did today.");

//If the message is empty, break.
if (message.isEmpty()){
javax.swing.JOptionPane.showMessageDialog(null, "No message entered. Aborting.");
throw "Process cancelled. Aborting!";

//Set the message property in ANT
project.setProperty("message", message);

Jenkins job and new build

May 8th, 2018

Got the Jenkins job working with the LEMDO name and a new build up and running after a request for the id page from JL. I've split the id list generation code out from diagnostics.

I've also combined the username and password getting Javascript into one dialog box, which is much nicer for the user--it's a much more familiar look and feel.

More work on editor packages

May 7th, 2018

Editor packages are working well now on my machine and the instruction documents are starting to come together. We need to start testing cross-platform, which is the next step.

Editor packages in oXygen

May 6th, 2018

Lots of work done today on editor packages in oXygen. I've now put the schematron within the ODD file, have it building out in the proper files (thanks to MH's code), and have a nice validation suite working. In data/tools/, there's a file called "validate.html"; editors can open it up, press the play button, and all of their files are validated.

I've modified MH's process schematron code slightly so that it doesn't fail immediately if it finds an error; it compiles all the errors first and prints out the message, and then fails. I like this a bit more, since I would likely use this code when debugging/testing the ODD, etc, and it's nice to see all the errors all at once.

The setup and updating builds are both working nicely; still not cross-OS compatible, but otherwise they're nearly ready to be used, I think. However, the committing stuff is melting my brain. I've looked at both svnant ( and svntask, and tried nearly everything I could find online, but nothing seems to work. More work to be done there.

Edit: I was misspelling the variable (I was passing ${password} and not ${pwd}). A very silly mistake, which I will chalk up to the combination of travel, conferencing, and sun over the last few days.

Work on SVN packages

May 4th, 2018
Got the first SVN package working, thanks to MH's advice yesterday. It is working on my machine, but we'll have to test consistently across machines and OS. Still need to make it compatible for Windows, but so far it's working nicely. I still need to write the code to get schemas and oxygen XPR files. That will still take some time, but I think the commit / up stuff should go faster. I'll write these three first, and then do the template files.

Working on summer plans

May 4th, 2018
With JJ and MH, figured out summer plans for LEMDO.

LEMDO restructuring

May 1st, 2018

The LEMDO repository is now restructured and renamed. Save for a likely few errant strings, everything should now be "emd"(early modern drama) or "lemdo". It came together quite quickly and enabled me to fix some things in the code. We'll still have to fix the conversion code (esp. in regards to paths), but that should be fairly straightforward.

Also started to work on the oXygen/ANT utilities for editors. oXygen editor variables (even the password ones) are not right; they are passed as parameters to the ant build, which show up in the ANT text stream at the bottom of oXygen, which is no good. The ant input element doesn't work either, since there is no way for ANT to work through the oXygen IDE (as far as I can tell, at least). After some research, I found an okay solutioon using javascript; although I struggled with the whole Nashorn/Rhino thing. I did this
load("nashorn:mozilla_compat.js"); which I got from stackexchange here: and here: I'll need to consult with others though, since I'm not all too familiar with this stuff. It works, however, and seems secure. Lots more work to do.

Playing around with ANT/CSS/Author mode

April 30th, 2018

Attempting to work with author mode, CSS, and ANT. It only works if the ANT file isn't prefixed by "build" (i.e. not associated with ANT in oXygen) and you have to work around the lack of flexibility in the ANT's XML. You can more or less force it by putting an @el='blah' on the description element; you don't get hierarchical XML, but it might be enough for a simple ANT file with some documentation.

Data structure

April 30th, 2018

Met with JJ and MH to discuss the repository structure for the LEMDO project. There was confusion to be hashed out, but we ultimately came up with a structure that I think works well. It is being forwarded to the major stakeholders to be approved.

We've also come up with the short form "emd" as a prefix

Consultation with JT, plan for editor package

April 26th, 2018

JT and I talked at length about the work he's now doing renaming everything tagged with "ISE" that now needs to be LEMDO; and we devised a cunning plan for editor projects:

  • When an editor is getting started we first send them out an Ant build file which they open in Oxygen to set up their project.
  • When it runs, that file creates directory, checks out the svn repo, and then runs a second ant task that does this:
  • svn update to get any changes to core files
  • svn export to get local copies of schemas (avoids externals)
  • That ant file is the one you open and press the red triangle before you start work.
  • Another ant file is the after-work version, which does svn update and svn commit.

So: no need for svn externals; no problem if connectivity is temporarily down; no need to learn svn; everything happens in Oxygen.

Repo and blog renamed

March 22nd, 2018 keeping with our forward planning.

Ran Rom F1 and Mac F1 through SGML-to-TEI...

March 21st, 2018

..for the benefit of other people who will be working on related texts. Results were valid, no errors.

Customizing ODD

March 17th, 2018

I'm beginning the work on customizing the ISE3's ODD so that we can have a "standoff" element to store all of the database-like stuff that the rest of the Endings projects have been putting in the teiHeader. It is based off of the stdf proposal, but is less concerned about linguistic annotation.

Basically, the standoff element contains model.listLike and listBibl (which is part of model.biblLike) and spanGrp (for annotations).

Note to self: Adding the standoff element (or any custom namespace element) between the teiHeader and the TEI element requires adding that namespace to the @defaultExceptions attribute in the schemaSpec element.

Combining builds

February 28th, 2018

Combined the two build files that we had in the ISE3 repo (ise3/diagnostics.xml and ise3/site/build.xml) and their associated ant_globals.xml files. It's now one build process, which by default goes through and:

  1. Validates the TEI in ise3/data
  2. Runs diagnostics on the TEI in ise3/data
  3. Then begins the static build process

Standard XML

February 25th, 2018

Working on the creation of the Standard XML, which for now means resolving pointers. Since the ISE has decided to use more granular prefixDefs (i.e. 'doc' for documents, 'pers' for person) instead of using general ones (like MoEML), prefix resolution can be more generalized. There's a template that matches every TEI attribute that has a pointer data-type and, like the Endings diagnostics code, resolves the pointer based off of prefixDefs. Seems to be working well.

A bit of progress on the XSLT for ISE3 output

February 6th, 2018

Wrote a utility function for retrieving data from the taxonomies; this will be needed to complete the Dublin Core metadata in the pages.

Starting work on the HTML output

January 25th, 2018

I've decided we should build the HTML pages from a genuine template, so that anyone who knows HTML can easily edit such things as the menu items and the boilerplate content. I've set one up, and given it a basic flex-based CSS layout that shouldn't be too hard for later styling. I'm thinking about building in the small-format device rulesets from the beginning, so they don't end up being grafted on later. The basic process would be to load the template, and process it through XSLT templates, with the source XML document passed as a tunnelled parameter; that should mean we can pull anything we like from the source XML fairly easily, and meanwhile most of the boilerplate stuff will just fall through in an identity transform. XML will be processed under a distinct mode.

Subversion documentation

January 22nd, 2018

JM needed documentation for subversion so got a start on writing that. We already had some stuff in there, but a lot of it was unedited stuff from MoEML's. Rewrote it significantly with code blocks and clear instructions. It's a bit less discursive, but it should do the job for now, since JM needs it right away. Used the oXygen TEI P5 --> HTML conversion and then saved as a PDF.

Lemma matching

January 20th, 2018
The lemma matching code is now re-written and rationalized; we no longer create a list of documents and apparatus. Instead, the transforms use a document collection (like MoEML's static build) and uses doc categories to determine whether or not the text needs to be tokenized. It's fairly fast and works quite well.


January 19th, 2018

Lots of work on the apparatus conversion. I've included MH's character code into the XSLT, which seems to be working well. Most of the plays are being handled well. One small issue is that all of the other annotations from the XWiki docs are being included as well, but those have already been converted to be inline on the documents. One solution would be to create a list of all the documents that don't need to be brought over.

Also began refactoring the process for attaching the standoff annotations to the texts. It's complicated business, since there's a lot of attempting to find the right documents to attach the annotations to. Currently, the process runs like so:

  1. Create a list of documents and their associated annotations
  2. Then iterate through that list
  3. Tokenize the base text and add ids to each character
  4. Attempt to match the apparatus files to the base text using character ids
  5. Then, add anchors in the base text where the apparatus ought to attach
  6. Finally, untokenize the text and just leave the anchors

A better and more flexible process might be to fork on type of text using the ISE document types. If the document is a primary source, then tokenize; otherwise, leave it. Then, for any apparatus documents, see which document it is attempting to match (encoded in its relatedItem in the header) and then look for the tokenized version. It will take longer in the long run, but it is simpler than nested for-each lists in ANT.

Regardless, the match_lemma module was (as MH rightly noticed) complicated and difficult to debug. I've refactored it now into multiple functions and added a "verbose" switch for very detailed bug reports. There's still lots of fine-tuned error checking and documenting to be done, but it makes more sense than it did before.

ISE3: Getting familiar with the build processes

January 18th, 2018

I've been doing some modularization of the ant build processes in the ISE3 repo, and taking the opportunity to get familiar with JT's work so far. I'm going to start work on the HTML output next, leaving the annotation/collation stuff to him.

Merge of facsimile work

January 17th, 2018

MT did a lot of work encoding the facsimiles using feature structures, and I've now merged that into the repo. It was messy because of the horrible tangle of "externals" we have, which are not really external; they're just local relative links. The external pointing from data/sch to sch did not update itself automatically; I had to delete the files in that folder and svn up to get it to refresh them. Annoying.

XWiki Annotations

January 11th, 2018
XWiki annotations are now embedded inline for the critical documents (i.e. documents in the crit directory) using the <note> element. Each altered file was diffed and checked for accuracy--there were a few instances of bad pointers (it seems that a few TLNs changed in some of the documents, so the entire annotation set was off by 2-3 TLNs) which I had to hand-fix.


January 3rd, 2018

Met with JJ and JT. Results: I've overwritten the current data/text versions of the test plays with my own generated versions, which now have glyph encoding; in the case of H5, I actually ran the glyph processing on the existing versions since JM confirms he's been editing those directly (although not since July). JT has brought back some of the collation/annotation offset-checking code from GitHub so we can integrate that. We've decided that we should not hard-convert editor-specified offsets using string matches into anchors until rendering time, because it's easier for editors to work with them if they're text-based, but we'll provide easy checking for editors.

More work on conversion

December 18th, 2017

Found some more minor validity issues in the output from the conversion; dropped Titus (not ready for prime time), and worked on the remaining six until all were valid. We're now ready to look at annotations and collations.

Full integration of char work; more refinement of conversion

December 15th, 2017

The character work I'd done and tested actually wasn't getting called in the build process because of a pre-existing set of templates that were converting some of the entities. Took out the old conversion templates and did a bit more work on mine, and finally H5 was correctly converting. Added in four more of the test plays, did some more tweaking and fixed some errors in the original IML, and then added in Timon and Twelfth Night to make a set of seven plays. These are now not only validating against tei_all but also against the ISE3 schema. We're making progress. Next is Titus, after which we'll move on to annotations and collations.

Character work integrated into conversion process

December 14th, 2017

Ironed out the last of the bugs, and tested with the file pilot works; they threw up a couple of other bugs in the conversion, which I've also fixed. I think we're now ready to move forward.

More work on characters

December 12th, 2017

There are some very weird things in the IML. I have thirty-odd weird entities left to figure out, but they're getting stranger and stranger. At some point we'll reach diminishing returns, so I'll just put something in the output that requires human intervention.

Work on characters

December 8th, 2017

I've worked through most of the curly-braced entities and built on the taxonomy JT created for glyphs, splitting it into two (ligatures and single chars), just to aid in clarity; I've created equivalences in the form of choice elements in a list in a TEI file, and that can be plugged into any transformation. The idea would be to order the items by length descending, so the long ones are done first, and have a template that matches text() and runs all the equivalent replace things using analyse-string. We may have to build the analyze-string element mechanically. Alternatively, we could just run a scripted cli replace thing.

Plan for identifiers; some work on figuring out how the transformations have changed

December 6th, 2017

Came up with a detailed plan for identifiers, for consideration by JT and JJ; then started looking at the stuff that's been done to the original transformation process I had running. The changes are mostly mysterious and undocumented.

Some work on id listing

December 5th, 2017

Crucial for the RAs to work is the list of ids produced by Jenkins, so I'm working on that. I think I'm done for the moment, but there are lots of issues with the actual structure of ids to deal with.

Back at work on ISE3

December 4th, 2017

Finally got back to this project, and started moving forward. Met with JT on Hangouts and set up a basic plan and some tasks. Cleaned up the repo a bit, and made a couple of changes on Jenkins (more to do there).

Going forward

October 19th, 2017
Comprehensive meeting today about the next steps in the ISE3 project. Biggest issue right now, for me at least, are some of the anthology-like texts and, as I realized later, reconciling annotations with the body of the XWiki texts.

Meeting on next steps

October 19th, 2017

LONG meeting, with decisions in notes and in Asana.

Schematron for Saxon HE / XSLT 3.0

September 26th, 2017
The ANT schematron task fails in the new Saxon 9 HE jar, so MT made a new ant-macro for schematron. It's quite a bit more verbose than the old one, but it seems to work well. It can be found at trunk/utilities/schematron/ant-macro.xml.

SFO meeting

September 22nd, 2017

Skype meeting on SFO project. Outcomes documented in email.

Meeting on ISE3

September 21st, 2017

Priorities per meeting discussion:

  • Make all XML valid so build process is useful again (JT).
  • Call a freeze on XWiki work (JJ) and convert XWiki pages (JT).
  • Fix errors in biblio code from XWiki sources (an RA working for one of us).
  • Finish documentation of conversion of texts, and iron out a few issues, then convert the texts (MT).

Review of encoding proposal

September 6th, 2017

Spent some time reviewing JT's encoding proposal for SFO; it's coming together.


July 12th, 2017
Logging work from yesterday and today. Looking at MT's changed to the SGML-TEI conversion and making some comments and discussed ligatures with MT. Also started to work on some documentation for annotations and collations; editors want to start working on the files.

Meeting with SFO

July 4th, 2017

They have prepared ontology/taxonomy spreadsheets, and JT will turn those into TEI taxonomies before we start mapping them onto elements and attributes.

Wrapping up 1st campaign for ISE3

June 30th, 2017
Discussed what needs to be done regarding the ODD and summarized the work over the last week before vacation.

ISE3 Templates

June 29th, 2017

Met with JJ and MT today for first walk-through of initial work, which went well. All of the ANT builds are currently working on Windows, which is great. Also finished up some templates (Decided to move from the regular project templates to ANT builds, which gives us a bit more control of where the documents end up).

We also decided today to hold off on doing the editor specific ODD; it's all a bit tricky with potentially different frameworks for TEI because of different versions of oXygen. We can almost do the same thing in schematron anyway.

ISE3 Editor packages

June 28th, 2017

Been busy trying to get everything finished before vacation, so I've been forgetting to blog. Summary of minutes below.

Editor packages should be good to go. Everyone has schemas, xprs, and tools in their project files, with the tools doing fairly decent lemma checking. I'm fairly confident that the ODD is good enough to start working with. I was able to get annotation and collation templates ready as well. I also added DJ's files from a few years ago to the repo and started chipping away at them; they're good and interesting TEI experiments that need a fair bit of wrangling to get into ISE3 TEI. A good exercise, though, since it's contributing greatly to the ODD and schematron.

Minutes summary: 23: 330min; 27: 360min; 28: 540min (360 in office)

ISE3: Editor package

June 26th, 2017
Discussed issues regarding witnesses with JJ, MH, and MT today and hashed out exclusion/inclusion of witnesses. Sorted that out, got the ODD + taxonomies transform working and then I started work on a new SGML->TEI build that creates a full edition package for editors. Almost done--just need to create the XPR file for each package. Next will be the editor build tools. Edit: Worked an extra 180 minutes at home getting editor build tools up and running.


June 22nd, 2017
Finished documenting the lemma checker and discussed editor tools with MT. Also continued work on the ODD and discussed best linking practice in terms of docs and TLNs. MT and I decided that on a few prefixDefs, the main ones being:
  • ident="doc" | matchPattern="(.+)(#.+)?| replacementPattern="$1$2". This gets us around TLNs not yet having explicit xml:ids (which we also decided will only be local to the document, not project-wide)
  • ident="tln" | matchPattern="(.+)" | replacementPattern="iseH5_FM.xml#tln-$1". This will only be used in the context of @to and @from in the apparatus files. The prefix will be defined for each apparatus/collation document so that the link will refer to a specific TLN in the text file (i.e iseH5_FM will change).
  • And simple ones to refer to people, document types, and glyphs

Lemma matching

June 21st, 2017
Revamped lemma matching code to make it much more efficient. It has now been split into 3 XSLTs, chained together in a build process. The lemma match function has also been rewritten so that it reports errors more accurately.

Discussions on lemma matchin

June 21st, 2017

Posting time spent discussing lemma matching and milestone insertion over the past two days.

Lemma matching and ODD tutorial

June 20th, 2017
Met with MH and MT about ODD creation. We've decided that we're going to try and do most of the documentation in the ODD itself and create an ISE-TEI guidelines from the standard transform (and wrap it in the ISE's styling). Most of the day was spent working on the apparatus matching code, which has preoccupied my thought for a while. I have an XSLT in the Git repo that matches lemmas that seems to be working; it's finding errors that are truly errors (incorrect ranges, bad characters, etc). The process is: * Tokenize the entire source text in 'c' elements with generated @xml:ids * Look at a TLN and see if we can find the right following characters that string together the proper phrase * If there's a match, add it to the @to/@from attributes in the span/app (depending on the context) * Then in a final pass, get rid of all the c elements and add anchors if there is an apparatus entry that references the character xml:id There's a lot of working with preceding nodes and ensuring characters are following the right TLN and all the nodes are being processed twice (first to find the beginning anchor and then again to find the ending anchor). This isn't the most efficient, but I think it will work out well. The next step is to integrate this into an Editor build tool as a diagnostic.

Annotations and collations

June 19th, 2017
Long day with lots of work done on annotations and collations. We have a fairly solid structure up and running. ise:rdg/@resp are becoming tei:rdg/@wit that will point to a tei:witness/xml:id in the header of the document that @corresp to a series, etc. Annotations have become a list of tei:notes with spans, glosses, and other notes. Still have to figure out iembeds, which will be worth while when converting the XWiki documents. This is all coming together nicely.

ISE3 mtg (JJ, MT, DJ)

June 17th, 2017
Had a whole day meeting with JJ, MT, and DJ to go over ISE3 implementation. Lots of great stuff discussed, most of which documented through Asana and the GitHub repo. And good headway made with annotations and collations; I've think we've discussed it enough to start writing some code that processes annotations and collations.

Work on Annotations

June 16th, 2017
Talking at length with MT about annotations and collations. He gave me a good run-through of how apparatus work and we made some progress thinking about how it will be implemented. We're still not sure about which method of annotation is best for the project, particularly since we're sort of wedded to string-matching. We've been wading through the TEI guidelines trying to find the most appropriate method for attachment.

ISE3 Conversion (420+ 375)

June 15th, 2017
Post for June 14 and 15: working on ISE3 TEI conversion. Long time spent trying to deal with unicode characters that were being garbled in OSX--solved by creating a small XSLT for conversion that used analyze string to tokenize each character and fn:string-to-codepoints to check whether or not the string should be escaped or not. Seems to be working well now. Discussed file structure with JJ and MT to come to the conclusion: each edition gets its own folder (with the ISE work id without the 'ise' prefix) that has documents, etc. Note: we are not future proofing the ISE to think about more than 1 edition.


June 13th, 2017
Added handling for glyphs and respStmts into the first and second passes (respectively) of the build process, which seem to be working well. Chatted with MT about linebreaks and he explained the various ways the ISE have used milestones and linebreaks. Summary of points made by MT:
  • TLNs, QLNs, L, and MS tags are all milestone units and not end of line breaks
  • Editors can make up their own types of milestones with their own numbering systems if they choose
  • White space line breaks (\n) are significant and represent the linebreaks for that particular edition
This poses some questions to ask MH and JJ at some point. Should all milestone like things become milestones with '\n's becoming <lb edRef="#thisEd"/>? MT also mentioned that since we're now using the ISE tools to manipulate the IML, we don't the multiple regex conversions that we're currently using.


June 9th, 2017
Continued working with taxonomies, and spent some time with JJ formalizing and finishing the responsibility taxonomy. Didn't have time to encode it today, but it's in an easily processable form, so it should be quick to add to the taxonomies document. Also discussed glyphs and chars with MH and MT; we decided that only glyphs that had standard correlations (most ligatures) would be encoded in the taxonomy. Diagraphs, accents, and other characters that are untranslatable would not.

Unicode, glyphs, and chars

June 8th, 2017
Working with MH and MT in figuring out special character stuff. MT already had some utilities built in to deal with characters encoded like {s} in the unicode, so there's a starting point. I had to wrestle with the sgml to tei code a bit to get it work. Started to build a taxonomies doc in ISE3's SVN repo with the chars in it. Note: OSX's documentation seems to be false when it says that by default it encodes in UTF-8. It was producing wonky output unless you appended the argument: -b=UTF-8.


June 7th, 2017
More work done on the SGML to TEI conversion. Prose and verse seem to be being handled well and the creation of the listPers in the particDesc seems right as well. I began looking at the code mapping for ligatures and other special characters. Not sure what exactly needs to happen there. Created a bash script that looks through the IML files to investigate how often that character is used; I couldn't use XSLT or anything like that since the ampersands, etc, are difficult to search with for text. I now have a script that lists the characters by number of use. It needs to be finished, however, so that we can see which files contain these characters and how we should work with them.

Coding standards meeting

June 6th, 2017

Met with MT and JT. We established coding and documentation standards for XSLT, Schematron, and Ant (to some degree), and figured out how the ISE2, ISE3 and GitHub repos will relate to each other and to the Jenkins build. Documentation in Github repo.

Setting up ISE3

June 6th, 2017
Worked on H5 and got it validating in TEI and worked a bit on the personography. Meeting with MH and MT discussing code standards, which was very interesting and helpful.


June 5th, 2017
Long day discussing and planning ISE3. We've come to the conclusion that Works are going to be a fairly simple taxonomy with short descriptors that will become landing pages. We also discussed code standards and set up a time to hash that out tomorrow. We'll start off with a few different types of docs and then create the ODD from those. We'll also hold off on creating the master character list until after more of the editions are in TEI.

Created new blog for ISE

June 5th, 2017

I hid the viHistory blog to keep the list length under control.