Permalink 04:52:22 pm, by mholmes, 6 words, 22 views   English (CA)
Categories: Activity log; Mins. worked: 15

Repo and blog renamed

...in keeping with our forward planning.


Permalink 04:12:21 pm, by mholmes, 18 words, 16 views   English (CA)
Categories: Activity log; Mins. worked: 15

Ran Rom F1 and Mac F1 through SGML-to-TEI...

..for the benefit of other people who will be working on related texts. Results were valid, no errors.


Permalink 10:51:49 pm, by jtakeda, 103 words, 23 views   English (CA)
Categories: Activity log; Mins. worked: 120

Customizing ODD

I'm beginning the work on customizing the ISE3's ODD so that we can have a "standoff" element to store all of the database-like stuff that the rest of the Endings projects have been putting in the teiHeader. It is based off of the stdf proposal, but is less concerned about linguistic annotation.

Basically, the standoff element contains model.listLike and listBibl (which is part of model.biblLike) and spanGrp (for annotations).

Note to self: Adding the standoff element (or any custom namespace element) between the teiHeader and the TEI element requires adding that namespace to the @defaultExceptions attribute in the schemaSpec element.


Permalink 07:04:04 pm, by jtakeda, 58 words, 31 views   English (CA)
Categories: Activity log; Mins. worked: 60

Combining builds

Combined the two build files that we had in the ISE3 repo (ise3/diagnostics.xml and ise3/site/build.xml) and their associated ant_globals.xml files. It's now one build process, which by default goes through and:

  1. Validates the TEI in ise3/data
  2. Runs diagnostics on the TEI in ise3/data
  3. Then begins the static build process


Permalink 03:34:10 pm, by jtakeda, 76 words, 33 views   English (CA)
Categories: Activity log; Mins. worked: 120

Standard XML

Working on the creation of the Standard XML, which for now means resolving pointers. Since the ISE has decided to use more granular prefixDefs (i.e. 'doc' for documents, 'pers' for person) instead of using general ones (like MoEML), prefix resolution can be more generalized. There's a template that matches every TEI attribute that has a pointer data-type and, like the Endings diagnostics code, resolves the pointer based off of prefixDefs. Seems to be working well.


Permalink 02:35:02 pm, by mholmes, 23 words, 45 views   English (CA)
Categories: Activity log; Mins. worked: 30

A bit of progress on the XSLT for ISE3 output

Wrote a utility function for retrieving data from the taxonomies; this will be needed to complete the Dublin Core metadata in the pages.


Permalink 04:23:58 pm, by mholmes, 135 words, 44 views   English (CA)
Categories: Activity log; Mins. worked: 120

Starting work on the HTML output

I've decided we should build the HTML pages from a genuine template, so that anyone who knows HTML can easily edit such things as the menu items and the boilerplate content. I've set one up, and given it a basic flex-based CSS layout that shouldn't be too hard for later styling. I'm thinking about building in the small-format device rulesets from the beginning, so they don't end up being grafted on later. The basic process would be to load the template, and process it through XSLT templates, with the source XML document passed as a tunnelled parameter; that should mean we can pull anything we like from the source XML fairly easily, and meanwhile most of the boilerplate stuff will just fall through in an identity transform. XML will be processed under a distinct mode.


Permalink 03:28:00 pm, by jtakeda, 71 words, 41 views   English (CA)
Categories: Activity log; Mins. worked: 120

Subversion documentation

JM needed documentation for subversion so got a start on writing that. We already had some stuff in there, but a lot of it was unedited stuff from MoEML's. Rewrote it significantly with code blocks and clear instructions. It's a bit less discursive, but it should do the job for now, since JM needs it right away. Used the oXygen TEI P5 --> HTML conversion and then saved as a PDF.


Permalink 09:26:58 pm, by jtakeda, 52 words, 45 views   English (CA)
Categories: Activity log; Mins. worked: 180

Lemma matching

The lemma matching code is now re-written and rationalized; we no longer create a list of documents and apparatus. Instead, the transforms use a document collection (like MoEML's static build) and uses doc categories to determine whether or not the text needs to be tokenized. It's fairly fast and works quite well.


Permalink 03:11:40 pm, by jtakeda, 305 words, 45 views   English (CA)
Categories: Activity log; Mins. worked: 180


Lots of work on the apparatus conversion. I've included MH's character code into the XSLT, which seems to be working well. Most of the plays are being handled well. One small issue is that all of the other annotations from the XWiki docs are being included as well, but those have already been converted to be inline on the documents. One solution would be to create a list of all the documents that don't need to be brought over.

Also began refactoring the process for attaching the standoff annotations to the texts. It's complicated business, since there's a lot of attempting to find the right documents to attach the annotations to. Currently, the process runs like so:

  1. Create a list of documents and their associated annotations
  2. Then iterate through that list
  3. Tokenize the base text and add ids to each character
  4. Attempt to match the apparatus files to the base text using character ids
  5. Then, add anchors in the base text where the apparatus ought to attach
  6. Finally, untokenize the text and just leave the anchors

A better and more flexible process might be to fork on type of text using the ISE document types. If the document is a primary source, then tokenize; otherwise, leave it. Then, for any apparatus documents, see which document it is attempting to match (encoded in its relatedItem in the header) and then look for the tokenized version. It will take longer in the long run, but it is simpler than nested for-each lists in ANT.

Regardless, the match_lemma module was (as MH rightly noticed) complicated and difficult to debug. I've refactored it now into multiple functions and added a "verbose" switch for very detailed bug reports. There's still lots of fine-tuned error checking and documenting to be done, but it makes more sense than it did before.

:: Next Page >>


Linked Early Modern Drama Online


XML Feeds