Permalink 02:35:02 pm, by mholmes, 23 words, 15 views   English (CA)
Categories: Activity log; Mins. worked: 30

A bit of progress on the XSLT for ISE3 output

Wrote a utility function for retrieving data from the taxonomies; this will be needed to complete the Dublin Core metadata in the pages.


Permalink 04:23:58 pm, by mholmes, 135 words, 17 views   English (CA)
Categories: Activity log; Mins. worked: 120

Starting work on the HTML output

I've decided we should build the HTML pages from a genuine template, so that anyone who knows HTML can easily edit such things as the menu items and the boilerplate content. I've set one up, and given it a basic flex-based CSS layout that shouldn't be too hard for later styling. I'm thinking about building in the small-format device rulesets from the beginning, so they don't end up being grafted on later. The basic process would be to load the template, and process it through XSLT templates, with the source XML document passed as a tunnelled parameter; that should mean we can pull anything we like from the source XML fairly easily, and meanwhile most of the boilerplate stuff will just fall through in an identity transform. XML will be processed under a distinct mode.


Permalink 03:28:00 pm, by jtakeda, 71 words, 15 views   English (CA)
Categories: Activity log; Mins. worked: 120

Subversion documentation

JM needed documentation for subversion so got a start on writing that. We already had some stuff in there, but a lot of it was unedited stuff from MoEML's. Rewrote it significantly with code blocks and clear instructions. It's a bit less discursive, but it should do the job for now, since JM needs it right away. Used the oXygen TEI P5 --> HTML conversion and then saved as a PDF.


Permalink 09:26:58 pm, by jtakeda, 52 words, 19 views   English (CA)
Categories: Activity log; Mins. worked: 180

Lemma matching

The lemma matching code is now re-written and rationalized; we no longer create a list of documents and apparatus. Instead, the transforms use a document collection (like MoEML's static build) and uses doc categories to determine whether or not the text needs to be tokenized. It's fairly fast and works quite well.


Permalink 03:11:40 pm, by jtakeda, 305 words, 18 views   English (CA)
Categories: Activity log; Mins. worked: 180


Lots of work on the apparatus conversion. I've included MH's character code into the XSLT, which seems to be working well. Most of the plays are being handled well. One small issue is that all of the other annotations from the XWiki docs are being included as well, but those have already been converted to be inline on the documents. One solution would be to create a list of all the documents that don't need to be brought over.

Also began refactoring the process for attaching the standoff annotations to the texts. It's complicated business, since there's a lot of attempting to find the right documents to attach the annotations to. Currently, the process runs like so:

  1. Create a list of documents and their associated annotations
  2. Then iterate through that list
  3. Tokenize the base text and add ids to each character
  4. Attempt to match the apparatus files to the base text using character ids
  5. Then, add anchors in the base text where the apparatus ought to attach
  6. Finally, untokenize the text and just leave the anchors

A better and more flexible process might be to fork on type of text using the ISE document types. If the document is a primary source, then tokenize; otherwise, leave it. Then, for any apparatus documents, see which document it is attempting to match (encoded in its relatedItem in the header) and then look for the tokenized version. It will take longer in the long run, but it is simpler than nested for-each lists in ANT.

Regardless, the match_lemma module was (as MH rightly noticed) complicated and difficult to debug. I've refactored it now into multiple functions and added a "verbose" switch for very detailed bug reports. There's still lots of fine-tuned error checking and documenting to be done, but it makes more sense than it did before.


Permalink 04:37:38 pm, by mholmes, 43 words, 15 views   English (CA)
Categories: Activity log; Mins. worked: 60

ISE3: Getting familiar with the build processes

I've been doing some modularization of the ant build processes in the ISE3 repo, and taking the opportunity to get familiar with JT's work so far. I'm going to start work on the HTML output next, leaving the annotation/collation stuff to him.


Permalink 10:04:01 am, by mholmes, 74 words, 13 views   English (CA)
Categories: Activity log; Mins. worked: 40

Merge of facsimile work

MT did a lot of work encoding the facsimiles using feature structures, and I've now merged that into the repo. It was messy because of the horrible tangle of "externals" we have, which are not really external; they're just local relative links. The external pointing from data/sch to sch did not update itself automatically; I had to delete the files in that folder and svn up to get it to refresh them. Annoying.


Permalink 09:59:11 am, by jtakeda, 66 words, 16 views   English (CA)
Categories: Activity log; Mins. worked: 210

XWiki Annotations

XWiki annotations are now embedded inline for the critical documents (i.e. documents in the crit directory) using the <note> element. Each altered file was diffed and checked for accuracy--there were a few instances of bad pointers (it seems that a few TLNs changed in some of the documents, so the entire annotation set was off by 2-3 TLNs) which I had to hand-fix.


Permalink 04:08:59 pm, by mholmes, 108 words, 19 views   English (CA)
Categories: Activity log; Mins. worked: 120


Met with JJ and JT. Results: I've overwritten the current data/text versions of the test plays with my own generated versions, which now have glyph encoding; in the case of H5, I actually ran the glyph processing on the existing versions since JM confirms he's been editing those directly (although not since July). JT has brought back some of the collation/annotation offset-checking code from GitHub so we can integrate that. We've decided that we should not hard-convert editor-specified offsets using string matches into anchors until rendering time, because it's easier for editors to work with them if they're text-based, but we'll provide easy checking for editors.


Permalink 05:00:22 pm, by mholmes, 38 words, 19 views   English (CA)
Categories: Activity log; Mins. worked: 90

More work on conversion

Found some more minor validity issues in the output from the conversion; dropped Titus (not ready for prime time), and worked on the remaining six until all were valid. We're now ready to look at annotations and collations.

:: Next Page >>


Internet Shakespeare Editions


XML Feeds