Lemma matching and ODD tutorial
Posted by jtakeda on 20 Jun 2017 in Activity log
Met with MH and MT about ODD creation. We've decided that we're going to try and do most of the documentation in the ODD itself and create an ISE-TEI guidelines from the standard transform (and wrap it in the ISE's styling).
Most of the day was spent working on the apparatus matching code, which has preoccupied my thought for a while. I have an XSLT in the Git repo that matches lemmas that seems to be working; it's finding errors that are truly errors (incorrect ranges, bad characters, etc). The process is:
* Tokenize the entire source text in 'c' elements with generated @xml:ids
* Look at a TLN and see if we can find the right following characters that string together the proper phrase
* If there's a match, add it to the @to/@from attributes in the span/app (depending on the context)
* Then in a final pass, get rid of all the c elements and add anchors if there is an apparatus entry that references the character xml:id
There's a lot of working with preceding nodes and ensuring characters are following the right TLN and all the nodes are being processed twice (first to find the beginning anchor and then again to find the ending anchor). This isn't the most efficient, but I think it will work out well. The next step is to integrate this into an Editor build tool as a diagnostic.