Category: "Activity log"
AR finished REME2 CSS, so I went through and cleaned up a bit, moving some definitions to rendition elements, clearing up some tagging confusion, and moving some labels around a bit. In the process, I added the option to provide explicit style on page-break hrs by adding a rendition element in the header with xml:id="[FILEID]_pb". This will be very handy for Stow. Waiting on some metadata and proofing before we can publish.
API project 523327615704 in my Google Control Panel has the API key for Google Maps; it previously had only mapoflondon.uvic.ca registered, but I've now also registered the jenkins server and the new mapoflondon6 dev domain, and uncommented the relevant JS; we'll see how that works when everything filters through the build process. I'll have to deploy a new XAR of MoEML to the test location at some point, but I'm making a few other changes today so I'll wait. Did some other fixes to stub documents and publication statuses.
All pages have "Cite this page", even though listings pages and search results (for instance) aren't really citable, and the link fails.
Our popup date information in the static build points to an explanatory page which is supposed to be at mol:dates, but no such file was there. I've created a basic stub with a couple of links, not yet published, but it will do the job for the moment.
I'm still trying to figure out how important it is to have exact phrase match searches working, and what the cost would be. Stopwords prevent exact phrase matches from working with the current lucene index (when the phrase contains a stopword, which it usually does); configuring other indexes just to allow this seems overkill. It may be that the way we're doing things now is fine -- although I notice that the minus sign does not work, so there's something I have to fix there.
JT's run of our diagnostic tools package against the Standard XML static output revealed thousands of errors, most falling into a few categories, which I've been addressing today:
- rs/@target was assumed to be pointing at ORGS1.xml, but it could equally point to PERS1 or BIBL1. Fixed.
- The obsolete facsimile pointers to images in the old tiled map display were not pointing at anything concrete. I've bitten the bullet and removed all of those references, along with pointers to them from elsewhere in the same file, and commented out resulting empty facsimile elements.
- File + fragment pointers were being handled wrongly, with just '.xml' appended to them. I've fixed that.
- The @change attribute on person elements (relating to their "publication status") only makes sense in the context of the PERS1.xml file, where they point to a specific change element; I'm now removing them where person elements are copied to another file.
- Many files had actual errors that hadn't been caught before; I've fixed a lot of those manually.
I'm now building and validating various parts of the static process before I run the diagnostics again on the results.
I wanted an easy way to list all the Stow chapters in a table, and decided on a new category, mdtPrimarySourceStowChapter, which is added to the chapter files when they're generated. To make that work in the build, I had to rewrite the XML category file generation process so that it works from the site/xml/original documents (where these generated files are) rather than the XML source in /db/data (where they're not). In the process of doing that, I cleaned up the globals module so that it has no hard-coded paths in it; they're all relative to the baseDir now.
Following a request at the team meeting the other day, I've added document status displays to all the category listings pages.
Following a good team meeting with lots of discussion, I've made a formal link to the viewable-draft document so that it's easier for everyone to access drafts, and also fixed a bug in the class attribute generation that was preventing the display of the draft watermark.
The new static build version of the Map of Early Modern London project has now transitioned from alpha to beta, meaning that we believe all key features have been implemented and we're in bug-fixing mode.
Got some sample transcription from KB, and did a test encoding, tweaking the XSLT and the schema too to get more worthwhile results.
Also fixed the problem with the generic search box on regular pages; it was missing some named hidden fields, so wasn't providing enough info to the search.xql file. I do need to make this a bit more robust, though; missing fields shouldn't break the results paging navigation as it does.
Went through the steps planned yesterday, and also added DC metadata for all the people and orgs mentioned in all documents; I think the headers are now looking good.
We have an issue with a small category of documents, typified by QMPS1. These docs purport to be mdtPrimarySource, and indeed they contain transcribed primary source selections, but that content is interspersed with born-digital commentary and no attempt is made to describe the bibliographic features of the original source. When rendered in the static build as mdtPrimarySource, they end up unstyled. I've hacked around this by insisting that a true primary source document (for rendering purposes) must include at least one rendition element in the header (other than those auto-generated from @style attributes during the build process); failing this, the document is treated as born-digital. In the long run, we should find a principled solution to this issue, and JT is drafting a description of the problem for the next MoEML meeting.
In the static build output, all of the standalone documents which are created from AJAX fragments (people, bibls, orgs) have a particular problem in their headers: they are constructed based on the entire containing document rather than the fragment which forms the actual body of the document. This is because the same hcmc:createHtmlHeader() function is called passing in e.g. PERS1 rather than the target person entry. I've been trying to figure out how best to remedy this, and I think the best approach would be this:
The hcmc:createHtmlHeader() function, along with the hcmc:getDcMetadata() function it calls, should have an additional optional parameter which is the component item which is actually the focus of the page. Where this is not passed in (the default situation), the same processing would be applied, but where it is, the document title and resulting metadata should be constructed based on it. Some of that data will still have to be derived from the parent document, but (for instance) the list of places and people referenced (the former already implemented, the latter still not generated in the DC metadata) should be drawn only from the links appearing in the target fragment.
Steps to proceed:
- Add the parameter to the two functions.
- Fix any calls to those functions such that they pass the empty sequence or the relevant fragment.
- Test the build process to make sure nothing is broken, including search (which depends on header metadata).
- Add conditional branching to handle the different cases where a full document versus a fragment is being dealt with, for the current output. This includes ensuring that the document title is correct -- so instead of "Complete Personography" for every person page, it should be "Complete Personography: Fred Bloggs". (Not yet sure what should happen for bibliography entries, where there's no natural "title" other than the document title; perhaps the first title element in the bibl should be used.)
- Add (to both cases) processing of the person data (listing persons mentioned in dcterms.appropriatething, presumably dcterms.subject).
- Check that the outcome is appropriate for all fragment pages.
I noticed that some procession maps were stored in the site/images folder rather than in data/graphics (which actually meant that they failed to appear on the static site), I've moved them, and fixed references to them in the documents. As a general principle, images which form part of the site structure and design (thus appearing on lots of pages) should be in site/images, while images which are linked in specific documents appearing on the site should be in data/graphics, where editors can manage them.
Fixed the issue of images not showing up, which turned out to be due to something odd in the controller; worked more on the search to refine the feedback captions and layout thereof, at the same time reworking the build process so the captions_module.xsl file does get included in the site and XAR output.
As with Graves and Mariage, I've refactored all the XQuery main modules to namespace all variables. Also did some other cleanup, and found a couple of bugs that need fixing:
- data/xsl does not contain captions_module.xsl, a generated file which is needed by the search XQuery. Not sure why.
- When images are built into the XAR (and the flat-file site), their paths should all be collapsed to just images/* or graphics/*, which happens in /site/, but in the XAR the original paths appear to be preserved. That's just dumb, and I can't see why it's happening, but it should be a simple fix.
- Some key words need to be added to the stopword list such as "London" and "map".
Things learned today:
- When you create fields in a range index in exist, give them unique and unusual names. I had one called "name", and it was borking the indexing completely (stuff ended up indexed that shouldn't be).
- In XSLT, do NOT use short-form attribute constructors for @xml:* attributes. It's not illegal, but it's inadvisable, and it causes your XSLT document to be unfriendly to XML parsers which will complain that the @xml:id attribute is not a QName. Use the explicit xsl:attribute constructor. This was preventing me from uploading XSLT into eXist, so I've refactored all the static build XSLT to solve it.
- There's a bug in the Agas Map static output (and possibly elsewhere) which you can see by finding ABBE2, then clicking on "Harben" in its popup. This is an issue relating to links in AJAX-imported content, and may require that AJAX content be processed a bit before being displayed.
- Simple is good when it comes to search. One drop-down for "Search In" is all we need.
Spent a lot of time working on the search today, and ended up making it much simpler:
- We don't need an AJAX option; just appending #search_results to the form/@action param gives you the effect you want when moving between pages. All AJAX stuff now removed.
- We don't need a "search for" option; it's better to have a more refined "search in" dropdown, allowing you to search the personography or the bibliography etc.
- We have some remaining problems with what to actually search. We will need to make a distinction between document content and the peripheral lists of other related documents. At the moment, this distinction is not adequately made across the different document types. There needs to be a single div which is isolated from the peripheral stuff which contains the content which is at the heart of the document, and unique to it, and that's what needs to be searched. That'll take a reworking of the static site output.
- I've added a tiny site option for MoEML, because it was taking ridiculous amounts of time to build and deploy the full app. It's very crude at the moment, but I should be able to refine it over time so it covers most of what we need.
- There's some anomalous behaviour with the exist indexing of meta elements in the header; the search should pull back only (for instance) tags with @name="dcterms.title", but it seems to be retrieving others nearby. That could be caused by corrupted indexes due to an incomplete uninstall of the webapp.
As I work out the logic of the various search options, I'm finding more tweaks that I need to do to the XHTML to make it amenable to the search. This will be an ongoing thing. But the search page itself is semi-functional at least, and I have the basics mapped out in comments for the more faceted searches.
We'll need to share our XSLT, so I've added it to the site files. Also wrote some (so far untested) functionality in the search XQuery, which remains remarkably simple so far.
I've started on the XQuery to process the search page, and it's coming along. Builds are taking a long time for testing purposes, though. Must add a few more variant targets that do subsets of the work quickly.
Worked on the build process and the controller file. No search yet, but everything else is working as far as I can see. Next step is the search.
Filled out the template with more info from KB. This is coming along. Now waiting on a couple of sets of info to create personography entries, then we may be good to go.
Beginning serious work on the eXist XAR for the static build. Created the collection.xconf and updated the ant buildfile based on what we learned from Mariage. Took the stopword list from the 100 most common English words on Wikipedia; it's hardly EME, but we'll probably tweak it later as we see how variants appear. There are 35,450 files to add into the XAR, and it's 762.7 MB in size; we'll see how long it takes to deploy such a monster.
Got this building completely; updated the schema and the file template; more discussion with KB on the header etc.; finished the handouts; tweaked the XSLT. I think all that remains now is to do a sample encoding to make sure the XSLT and CSS works properly.
- Added students from the classlist to PERS1
- Added a new org to ORGS1 for this class, linking the students
- Revised two more of the five handouts for the new text (only #5, the metadata, needs work now)
Fixed a few encoding things I found in the process in the MoEML XML.
Building on the previous work, I've created an ODD file (which won't build into a schema for some reason, yet), edited a couple of the howtos and created a build script.
A Stonehill class will be encoding this text for MoEML, and I'll be developing the usual encoding package for them. We've been working out basic requirements:
- The structure is masque-like, so we'll need
- There will be nested
<lg>s. We need to make sure that's handled elegantly in the rendering. We'll also need to include a taxonomy of lg types, based on vpn but perhaps much simpler.
- Alternate lines of verse are indented; we'll handle this with a rendition element in the header:
<l rendition="#indentLine">which the students will point at with
@rendition. The handouts will need to include information about how to do that.
- They will be encoding rhyme, but not categorizing it, so our rhyme handout can be adapted and included.
This post will be updated as we work out the details.
Some tech help and consulting on the Eng362 schema and the Hornbook encoding.
Got a class list from KB and added all the students and their class to PERS1 and ORGS1.
- Filled out all the remaining rendition descriptions.
- Reworked bits of the header.
- Reworked the Author CSS.
- Rewrote a lot of the view.xsl (HTML generation), implementing decisions on how to represent blackletter and roman in the output.
- Completed a test fragment encoding and bigfixed based on it.
- Updated the schema for new elements and attributes, and tweaks to CSS-checking Schematron.
- Updated all the how-to documents to cover the new features and remove the inline CSS description.
This is now a very useful package and a highly flexible set of tools we can use for other encoding classes.
More work on the template for WY, which now has proper respStmts according to MoEML norms.
Added PERS entries for all the Eng 362 students last year, along with an org for the class; also added an org for last year's Stonehill class (although their person entries were already there).
Found two bugs in the static build, one trivial (the footer menu has one link with .htm at the end, and it has to stay because of the current site, but needs to be removed for the static processing), and one gnarly (some names processed into empty links in the rendering of orgs). The latter turned out to be due to the use of * to retrieve person elements with @xml:id; * means any element in any namespace, so it was retrieving fragmentary examples from inside egXML elements, which then did not have the required reg element inside them, resulting in empty output links.
Beginning a test encoding for myself so that I can test what I have to write to support the marginal labels and various font sizes.
Stonehill will be running a new course in November to encode the Wonderfull Yeare 1603 text, and the old materials we had for Eng 362 and the Hornbook had to be updated for the new context. I've created a new folder in workshops called encodingClass, in which I have a parameterized build process which enables you to create customized schema, template and encoding instructions documents and then use a standard build process to create a package for your class. It's tested and basically working, but I now need to enhance the new class's package to handle marginal labels using @rendition, explaining how to do it in their instructions, and making the XSLT handle it for proofing purposes.
I fixed the date contents-rendering problem discovered yesterday, which meant that the content of dates was being rendered, and lo, new problems appeared, especially with date content which is intended to be clickable (as are dates). I've ended up disallowing (by Schematron) editorial notes inside dates, and fixed a couple of instances of this in Stow by splitting the date elements around the supplied/note causing the problem. Did other minor tweaks to the extreme and the normalized views, and fixed some other encoding issues in Stow which are now revealing themselves. I also implemented a background image for items which are in peer review, and began to think about ways to render alpha and beta versions of the static site in ways that make clear that they are not releases.
The Stow rendering is all working as intended, but I made some tweaks today, including adding a new option for the user to Shift+Click on the "Diplomatic" option and get to see a real blackletter view. This is crucial for checking whether the actual encoding has correctly modelled the document features, since the regular "diplomatic" view does some font-normalization that can obscure font-shifts in the original. In the process I found and fixed a bunch of encoding errors in the CORN1 chapter.
I've implemented all the requests arising out of the meeting yesterday on rendering of Stow. Keep finding encoding problems in Stow as I'm doing this; the rendition encoding is particularly flaky at the moment.
I've integrated a new switchable view mode for Stow chapters. There's now a checkbox that appears at the top right of the page, which can be used to switch between the "normalized" view and the "original" view. These are the things that change:
- In the normalized view, there's no long s; in the original view, it's there.
- In the normalized view, most of the forme works are removed; I retain only the page numbers, which are all centred.
- Line-breaks and line-break hyphens are removed.
- Blackletter fonts are eliminated, and roman fonts used in contrast to blackletter are turned to italics.
This also means that you can Control + F search using regular s in the normalized view, but with long s in the original view.
The static build broke at the end of the day yesterday, and the problem proved to be extremely difficult to track down. In the end it turned out to be caused by some incidences of
<name> elements with
@rendition but no
@ref (due to an encoding error). To avoid this, I started by adding a Schematron rule, because I saw no reason why these should exist; however, it turns out that there are quite a lot of them, and although I fixed one or two, I had to give up working on them and instead added a new check to the diagnostics which ensures that we at least know they're there; they're mostly in unfinished mayoral shows.
I still have to figure out why this broke the XSLT. In the meantime, I've also reversed my decision to have ids.htm but not azindex.htm; since the latter was always linked from the footer, that's what we should go for, and we should fix the links to ids.htm (I've already added a redirect). I've fixed a bunch of other bugs in processing too, including the missing initials in editorial notes. We are slowly making progress.
Although I'm still tweaking a bit, I've basically finished the listings page that becomes ids.htm. I've drawn its content from the site/xml/original content, so that generated files are also included, and I've replaced the original azindex.htm (which was a published-only version of the same page) with a simple redirect. No reason to hide the fact that we're working on something, although we may not make the something completely available till it's published.
Got this working before leaving at the end of the day; did some more work at home and first thing this morning to clean up all the now-unnecessary @rendition attributes in Stow. This all has the added bonus that lots of relatively unedited Stow chapters will look reasonably good in the static build output.
In the eat-your-own-dogfood dept, I decided over the weekend that it would be a good idea to make the effort to use the
@selector attribute on
<rendition>, instead of having the body of the document full of huge numbers of repeated
@rendition attributes in the document body. However, this requires that we process the results, and actually reconstruct all those pointers by resolving
@selector at the standalone XML generation stage. I've put that into practice today. It's a slightly tricky process involving the creation and execution of a temporary XSLT file for every document concerned; I don't yet know whether it will work, but I think it can be made to do so, and I've consequently pruned Stow 1598 quite fiercely to remove a lot of these attributes.
For a while now, as the static build process has become more complicated and lengthy, it's been a bit tedious to wait to see the results of changes you made to the build or the documents. Today I reworked the whole process to parameterize it, so that you can pass in a set of document ids and have it process only those ids. This should only really be done in a context where the whole build has been run at least once, to give you all the surrounding framework for a functioning site, but after that it's very quick.
You can now send a single document or a few through the build process like this:
ant -lib ../utilities -DdocsToBuild=ABCH1,stow_1598_CORN1 subset
This will complete the entire build process including all the various XML outputs as well as the XHTML, for the specified ids. It doesn't do any validation. It does copy all the schemas, JS files, CSS etc. to the site folder, since that's trivial and quick, and often needed if you're working on rendering issues.
The individual stages have also been parameterized, so you can (for instance) just generate the XHTML5 for a specific document like this:
ant -lib ../utilities -DdocsToBuild=ABCH1 createXhtmlDocs
This should significantly streamline our dev process.
At the moment, I see no reason to parameterize the validation targets, but we may decide we want to do that at some point.
Last night I added RSS feed generation to the static build, with the blog feed, then supplemented it with the news feed this morning. In the process I discovered that our existing feeds have been slightly broken for ever; they were pointing at mapoflondon.uvic.ca/redesign, which redirects to the correct location, but still; fixed now.
Added fixes for:
- Paragraphs which were losing class="para" when they had @rendition.
- Images which had style attributes to size them, but they were not being used.
- Items in the blog post listings page, which needed their sample text.
- Images in the blog post listings page, which needed to be thumbnailed and gray-scaled.
In order to be able to back completely away from the original-document-style display, it makes sense to be able to provide a real hard-core rendering of the document, using blackletter and all that; if that's available, then we can switch away from it with much more freedom to transform the rendering to something more standardized and reader-friendly. I've been working on that for most of this afternoon, and we're making some progress. Lots more work to do, though.
Did the POPE ones because they came up when I was working on KING12 and since I had the info I did them. I think we're now left with about a dozen locations for Cornhill.
In the loc-a-thon, did stubs/abstracts for KING2, FLEE6, STMA1 and TEMP1; found a modern location for POPE6; and fixed a bunch of bad links in Stow in the process.
No time to report...
The appearance of an invalid file that broke builds prompted me to implement a Schematron rule that checks that your document filename matches the root element's @xml:id. That then triggered a bunch of errors in the static build because we were being cavalier about that sort of thing when creating and using temporary files during the build process. I've now fixed all those errors.
- Duplicate rendering of names in popups and person pages now fixed.
- Footnotes are now working properly, with Schematron preventing use of unnecessary paragraphs inside them.
- I fixed a bunch of issues in Stow 1598 (bad links, missing style, etc.), and in some other files.
- I added test rendering in CSS for the empty hi elements from Stow, so we can visualize what they might look like.
- Made the map work slightly differently when you link to a single location: now it's selected.
- I've made a start on the creation of the A-Z listings page, but have no content in it yet. There are decisions to make about how linking should work from that page. Obviously we don't want to populate it with everything on the site.
JT and I finally figured out what the current Schematron warnings are about in the build process, in debugging a rule which didn't seem to be firing. It seems that the Java Schematron compiler is quite crude in the way it builds XSLT from rules, and won't properly process rules which match on attributes. We've now fixed the problem one from today as a proof of concept, and tomorrow JT will work his way through the others to ensure that the rules are really all being applied to our static build XML.
Got the hash-based linking working, which means you can link to the mention of a specific named item in a text.
Fixed a whole bunch of errors and problems both in encoding and in rendering. There are still a few obvious issues, but I think we're now in the mopping-up phase.
Made a number of fixes today aimed at getting the static build to validate successfully:
- There are many cases in Stow of linking elements (ref, name etc.) containing supplied elements, which contain notes. The notes are turned into links, which means we have embedded links. In all cases (about 15) I've split the outer linking elements using @next and @prev. This will result in some odd entries in the spelling variants, but they're possibly justifiable, since they are what appears in the original text.
- The Map was using GN's UI font, and specifically referencing a spinner/wait character which was in the PUA. We've now moved that character to a more strictly correct location based on its purpose (\u23F3).
- When updating the font, we checked on browser support and decided to eliminate all eot, svg and ttf fonts, on the basis that the woff version should be enough. This simplifies the CSS and cuts down on the distro size.
- I've re-encoded a complicated nested table structure in Stow "Law".
We need to get to a point where the XHTML5 is valid so that we can browse the results on Jenkins. Found a few remaining issues in the data, including combining diacritics detached from their original char, so I've added a Schematron rule for that. Fixed some XSLT bugs, and I'm now at a point where I think only novel errors from Stow are likely to be thrown up. Waiting for the long build process to complete now, to see the latest crop of errors.
I've gone through the original primary_source.xsl and general.xsl to transfer and/or adapt all the remaining templates that I believe we need for the static build, integrating them all into the single xhtml5 mode, and trying to avoid distinguishing between them too much. At this point documents are building happily and looking not-too-bad, but there are still invalidities (some in bornDigital docs, caused by the introduction of templates from primary source), and also I think there are changes that need to be made to the chapter-splitting Stow code to make sure that the base style info for the page width etc. is imported from the parent document into every chapter. Getting there, though. Another few days...
The convention of using hard .htm links to generated pages is not appropriate for the static build, where those generated pages end up actually existing as XML documents, so we need to create alternative approaches to all of those instances. The biggest case was the mdt...listings documents, and I've dealt with that today by creating a new mdtlist: private uri scheme which provides a way of linking to any of those document type taxonomy pages. The current eXist webapp has been updated to deal with this, as has the Schematron, and I've written all the handlers necessary for the static build. I also updated the Praxis documentation appropriately. There's a small number of remaining generated documents (A-Z Index being the prime example) which will also have to be handled.
Core decisions from meeting:
- We will finish static build work on born-digital docs, and then make Cornhill the focus of the primary source work, so we can have a version ready for the reviewer soonest.
- We will consider converting document status="stub" to a document type taxonomy item mdtStub, but this will be done through the static build and not back-ported. Humans will have to address the status of documents which currently have this status; meanwhile, the static build code can read the document type instead of the status for use in listings pages.
- MAPS1.xml will be converted to a listBibl of bibls by TL.
We still have a couple of bugs that require some decisions from the team, but we're getting close. Soon we'll be able to turn our attention to the primary source docs.
JT has been building a list of what's not yet working in the born-dig output , and we've already addressed a number of these issues; I believe we now have dates working as we wanted (with and without JS), and I've also got the home page working properly, which involved importing and tweaking some more templates from the old XSLT. Flattened folder structure in the graphics subfolder is now supported, and other stuff looks better. There's a list of a dozen or more other fixes we need to make.
Looked at the static build output for born-digital docs, and:
- fixed some CSS display issues in the footer and popup boxes
- enabled the rendering of mdtDatabase files to XHTML5 (why not?)
- fixed a bug in the generation of category subpages
- tweaked the document titles for a couple of those so that they make more sense when rendered
- fixed some Schematron issues (more fixes needed, I think).
Went through all the data subcollections containing XML on eXist, and refreshed all the content; then removed a bunch of obsolete files, duplicates in the wrong place, and other errors, as well as resetting permissions on a bunch of files. Updated all the generated content too.
The Jenkins build for static was failing because Saxon was running out of memory. After a bit of experimentation with ANT_OPTS, which didn't work, I've now settled on this for the java tasks which run Saxon:
<java fork="true" classname="net.sf.saxon.Transform" classpath="../utilities/saxon9he.jar" failonerror="true"> <jvmarg value="-Xmx1024m"/> <arg value="-s:xsl/xhtml_docs_master.xsl"/> <arg value="-xsl:xsl/xhtml_docs_master.xsl"/> <arg value="--suppressXsltNamespaceCheck:on"/> </java>
This works; note that the task needs @fork="true" before the
<jvmarg> element will take effect.
Many fixes throughout the processing of listPerson in org, as well as updates to the build script to try to force the build to fail more often when e.g. validation throws up errors. Schematron is still not failing the build when it should; needs more attention because the failonerror parameter for the ant task seems to be broken. Also found and fixed a bunch of encoding errors, and tweaked Schematron to prevent them.
Lots of stuff in the static build was actually invalid, so I spent a while fixing issues with nested links and so on, and tweaking problems with the bibl and pers standalone HTML files. I've also started work on the ORGs, and I believe we have a strategy for dealing with it, but it's only half-implemented.
Pages are now being created for people and bibls, and I'm working on some of the issues around the display and linking of orgs -- nesting along with the listing of people by empty person elements with @corresp is causing a couple of issues. We're definitely getting there, though.
That's now working in the XHTML5 output, so I can move on to the next thing.
First off, today I got the document type taxonomy "original" XML building, then followed it through to XHTML output, correcting things all the way. That's now fine.
Popup variant-spelling functionality is still not working correctly on the gazetteer XHTML pages, so that's next on my list.
I now have the whole variant spelling process working well, and I've done a bunch of other bugfixes on both XSLT and data to arrive at a point where the whole HTML5 output is now validating (although there's one warning about a private use character which TL is working on). Good progress today.
I've enhanced the variant spellings stuff a bit and I'm now generating AJAX fragments from it; this will serve as content for the gazetteer pages and for the location page files.
After much discussion, we've decided to go with a system based on that outlined two posts ago, with some additional clarification about the status of bibliographic codes appearing between sections. If a new chapter begins at the head of the page, the formeworks etc. appearing above should not be part of its div, but they should be harvested to be part of its single-chapter TEI file, providing the bibliographic context for the conceptual div that follows them. Ditto for bibliographic elements appearing after the end of a chapter div. This means that such elements should first be moved out of their containing divs, making them div-liminal, and then the chapter-splitting code needs to take account of them. In addition, any page break preceding a chapter div which starts in the middle of a page must be harvested, and copied (with a special @type attribute yet to be decided) to the head of the chapter file being created. The existence of this will enable us not only to link to the page image, but also to determine that the chapter does not start at the top of the page, and signal this in some way in the rendered XHTML.
With the exception of the pb/@type attribute change, I've made all the changes necessary to the schema to accommodate these decisions, and we've added Schematron to stay on top of how they're supposed to function.
I decided to generate not just the linkGrp structures but a fully-expanded list of lists of variants, each with a link to the actual document from which it comes, constructed from its title, as the body of the variant spelling output. This saves additional steps in downstream processing. That's now working and valid.
I needed a much cleaner and more robust set of variant spelling data, not only for the gazetteer, but also for the provision of variant spellings with links in the location files themselves. I've now completely rewritten that code, and extracted it from the xml_create_generated_master.xsl file, so that it can run separately first, and its output can be used by the latter process. I had to do some schema-tweaking to make it valid, because all our usage of
<link> up to now has been for a single purpose, and highly-constrained, but this is more generic.
JT and I have worked out a plan for handling the Stow 1598 in such a way that we can track the publication status of separate chapters, and create standalone versions of those chapters in XML and XHTML. This will enable us to handle peer-review and publication on a chapter-by-chapter basis. Here's how it works:
- Each identified chapter will constitute a distinct
- In the
<teiHeader>of the Stow document, the
<revisionDesc>element will contain a
<listChange>element; before the existing change elements, its first child will be a
- Inside this listChange will be a single change element corresponding to each chapter:
<listChange xml:id="stow_1598_chapter_status"> <change xml:id="stow_1598_cripplegate_ward_status" when="2016-04-12" who="mol:DUNC3" status="draft"/> <change xml:id="stow_1598_breadstreet_ward_status" when="2016-04-10" who="mol:LAND2" status="final"/> [...] </listChange>
- Each individual chapter
@changeattribute pointing to its corresponding
<div xml:id="stow_1598_cripplegate_ward" change="stow_1598_cripplegate_ward_status"> [...] </div>
- When we process the overall file to create each of the individual chapter files, we take the
@statusattribute from the corresponding
<change>element in the header and give its value to the
@statusvalue in the header of the chapter file.
<listChange>element in the header can be processed as a key to split out the individual chapters, and also to generate the modern table of contents page.
- When an encoder or editor determines that a chapter has reached a new stage, that person updates the corresponding
<change>element in the header to specify the new status and date.
I've update the ODD and regenerated the schema to allow for this.
Fixes for problems with items duplicated in the XHTML output, due to being in the body as well as in the header, mostly in XIncluded documents. Output is currently still invalid due to namespace issues arising from date code, which JT is fixing.
I've moved the generation of mdt category files back to the original XML collection, and tweaked accordingly; I've also made a number of changes to the schema linking in all of those files (incorporating the rng file as a Schematron source for additional checks), and now all the original XML stuff is validating correctly, as is the standalone which is generated from it.
I decided that it would benefit us to process our source encoding into something cleaner for the "original XML" folder, expanding XIncludes and standardizing relative paths to schemas, so I've done that; the standalone is now based off the original folder. Having done that, I'm now back to building the gazetteer XML content, and working on the index of variants to document lists which will be harvested for the variant set at the top of each location page. It's basically working, but I'm still wrestling a bit with problems of whitespace affecting validity. I think I've solved (for the moment at least) the problem of creating a unique id for each variant spelling, based on the spelling itself.
After some thought, I've decided on the following strategy for handling the gazetteer, and implemented some of it:
- We generate TEI files for each of the "letters" of the gazetteer alphabet, giving us a source file for each of those pages. That's working, along with generation of the JSON. These files require specific markup practices which the schema now allows for, and use two new private URI schemes.
- These files are put into the site/xml/original folder.
- They are validated there along with all the others.
- Validation will require some tweaking to the copied originals and these new ones, to do a couple of things: expand the XIncludes, and tweak the schema PIs so that they point to the correct location.
- From these files, we generate another file which contains a list of all the locations, and for each location, a list of the variant spellings, and for each variant, a list of links to the documents that use it. This will be used as the basis for providing the variant spelling collection at the head of each location file, along with appendix items which can be turned into popups showing links to all the documents.
- Gazetteer XHTML5 pages will be generated in the normal way from the "original" source files, once those have been converted to standalones. This will require addition of specific templates to match the tabular encoding and particular URI schemes (molagas: and molvariant:) which I've introduced as part of this construction.
- Ditto lists of variant spellings for locations, except that they're not yet links; haven't figured out how to handle that yet.
- I now have a system in place for handling draft documents, which works a treat (and should work fine with and without JS too). You just add #showDraft to the URL to see the full and hide the warning.
I've started work on creating the body output for born-digital documents. I have simple output rendering in a recognizable way. One thing I realize is that I'll now need to go back and write functions for creating the Agas Map, Google Map, variant spelling and docs-with-mentions blocks at the top of the page. These will presumably be hidden by default but shown when their heading link is focused, or something like that; then JS will inject a hide-show function. Much work to do here, but it's all in nice modular nuggets that'll be easy to work on in between other projects.
I've implemented the page content menu for XHTML output pages, and it's working well; results are slightly different than for the current output, but I believe they're actually more logical. Much use made of tunnelled parameters. I've also implemented a sortKey system for bibls, so that when they're imported into documents as references, they get ordered correctly. There are a number of things I need to fix, though, relating to the display of footnotes (which are not hidden by default), and I also somehow need to handle the problem of bibls displayed in the document not having their titles converted into links. The issue is that I was thinking ahead to the single-page display, where the title obviously shouldn't be a link, but I probably have this backwards: the AJAX fragment should have a link, pointing to the static page, and when that is processed into the embedded reference it should be retained, but when the static page is created, it can be removed. However, the in-page instance of the link needs to generate the popup, in order to show the "cited in" bit, whereas the popup needs to jump off to the single page view. Too complicated for my simple brain today; come back to it next week. Meanwhile, the new stuff is working well.
In the mapathon, managed to map three new places on Agas, and eliminated several duplicate or inappropriate (because not in London) places.
In the static build work:
- Got the orgs correctly rendering in the AJAX (they were invalid).
- Got them picked up appropriately into the larger documents.
- Added some CSS to style the appendix content for user who don't have JS installed.
- Wrote the JS to change local links to popup calls.
- Wrote the JS to hide the appendix (too crude right now; must be refined).
- Wrote the JS to redirect mdt list pages with querystrings to their static equivalents.
- Tiedied up of citation output.
- Fixed various validity problems, including changing ids in the footer to classes to avoid clashing with existing document ids, and corresponding CSS.
I've finished building the headers and page chrome, and I have all the images being intelligently copied over from the data and site folders, so things are beginning to look right. There are a number of oddities that need fixing, and I also need to generate all the static pages for the fragments (bibls, people etc.), but there's clear progress. We face the problem of the Google Maps API requiring a host on a specified URL, so I've commented out that code for now; we need our own OL3/OSM/tiled solution for these maps to get away from Google's restrictions.
After some rather hacky fixes to avoid nested links and empty heading elements, the entire corpus of XHTML documents produced in the static build now validates. The documents don't have any bodies yet, of course, but we are close to beginning work on that.
- Appendix now includes people and places.
- A new xhtml_docs.xsl file is integrated into the build, to create each of the XHTML5 output docs.
- The building and validation of these docs is now integrated into the static build.
We now seem to be able to build all the XHTML5 documents from full TEI sources successfully, but many do not validate because of a variety of problems, many related to the source data (I fixed many issues today, including lots of bad bibl linking which JT is now trapping for in diagnostics, and some nasty unescaped URL characters), but also because of nesting of links, something that we'll have to cautiously weed out through the XHTML templates.
The left column processing is now complete, so I think we have the basic chrome of the page already. I've also added handling for related items from LINKS1 entries to the appendix. There is much more to do in the appendix, though; I'm not yet handling PERS entries, for instance, or places.
I'm busy creating the functions which build the page chrome at the moment, and today I wrote the ones that create the credit menu, the XML menu, and the category information. I also refactored some earlier ones so that they now build based entirely on the standalone XML rather than referring back to the original source documents; and I've tweaked the standalone build process a bit too. I now need to create AJAX fragments for the related-documents/disambiguation popups, which are based on LINKS1, and then incorporate those into the Appendix.
I'm now generating XML versions of the document type listings pages, so that these can be converted to XHTML5 along with other XML docs. They're standalones like the others. There are some wrinkles: the listings pages for e.g. locations contain links to the Agas Map, but there's no XML link that makes sense for this since the map itself is not an XML document, so for this and other similar minor issues I'll have to adopt protocols that the XHTML templates can recognize. I've rebuilt the project schema from the bleeding-edge P5 in order to get the
<table> element with
att.typed, giving me some options for carrying info forward to the XHTML transformation.
Did a bunch more work arising out of yesterday's mapping session, converting stub contents into abstracts, and followed up on more places that I was able to identify.
Found a few new locations, fixed a few bad ones.