JT finished his code last night, and we looked it over this morning and discussed future plans for useful XSLT over the summer, including the possibility of writing an Oxygen ant task to import contributor work from docx and convert it to a base TEI file for correction and enhancement. This would cover lots of useful ground: ant, XSLT, ant scenarios in Oxygen, word-processor document formats, and lot of other meaty things.
Spent some time working with JT, who's beginning his XSLT learning process, to get a sort of diagnostics stylesheet started which will identify good candidates for location on the map.
Sunday night and today, replaced the old map with the new map in all contexts; changed the landing page for the map to a standard MoEML document; removed "experimental" from everywhere it appears; fixed the gazetteer generation code along with page and mdtList rendering code to remove links to the old map; and cleaned up a host of other stuff.
I've also put in a temporary conditional redirect (with an override I can use for my own testing) that sends the dev version of the map to the live site; the dev URL got into the wild, unfortunately, so for a while we'll have to do that.
Tasks arising out of meeting on Wednesday. The nextFreeId/location template is now live, and I've also added a status column to the A-Z index page.
...for the Agas map locations. This can be extended to real geos if necessary, but let's cross that bridge when we come to it. We still have to figure out how to render these elements on Agas.
I have this working now, and it's tied into the nextFreeId system. Waiting for comments before porting to live.
With input from TL, I'm creating a further stage after nextFreeId which will generate a location file from data entered into a form, based on a template. The form is set up and working, and I've spent some time experimenting with embedding the template as XML into an XHTML script tag, following documentation on the MDN site, but it seems like it will be difficult to treat this data both as XML for the purposes of injecting it and as text for the purposes of doing search-and-replace to insert the data. May have to resort to generating the file on the server and returning it to the user.
I now have a complete auto-markup process tuned and tweaked, and it also handles restoring (virtually all of) the lb and hi tags which are removed initially to facilitate name recognition. Very neat, and it's basically ready for testing to find out how accurate it is. I've built in some rudimentary timing too, so I'll run it tonight to see how it does.
Working around the problems with hi tags and with lbs in the middle of words is proving extremely tricky. I have a transformation now running over the weekend, and we'll see if it completes with useful results. Regex is now well over 600K.
One major problem we have with adapting the procedure used for MoM to MoEML is that in the Stow 1633, many names, but more frequently parts of names, have been tagged with <hi>
(no attributes), to signify that they are blackletter in the original. This would disrupt our tagging capabilities, so this is what I propose:
- Identity transform which replaces opening no-att hi tags with → (right-pointing arrow followed by space), and closing tags for same with ← (space followed by left-pointing arrow).
- Named entity regex construction code includes the two arrow characters alongside spaces as delimiter in a character class for each regex fragment. This means they will not prevent matches (assuming they wrap at word-boundaries, which is the norm).
- Text with arrows is tagged by identity transform as planned.
- perl search-and-replace puts the
<hi>
elements back.
Potential issues include the last phase, where we might get overlapping tags instead of clean nesting. We'll have to see if that happens; if so, the perl process might be able to fix it, or a subsequent processing step might.