I now have the variant spellings being automatically harvested into the HTML output for location documents, and the result is functioning correctly with and without JavaScript. I'm now in the process of rewriting the code for generation of the gazetteer pages themselves so that instead of figuring out its own set of variants, it simply harvests from the spelling_variants.xml file which is created already. Once that's working, the existing appendix code should take care of pulling in the variant spellings when generating the HTML.
I've enhanced the variant spellings stuff a bit and I'm now generating AJAX fragments from it; this will serve as content for the gazetteer pages and for the location page files.
After much discussion, we've decided to go with a system based on that outlined two posts ago, with some additional clarification about the status of bibliographic codes appearing between sections. If a new chapter begins at the head of the page, the formeworks etc. appearing above should not be part of its div, but they should be harvested to be part of its single-chapter TEI file, providing the bibliographic context for the conceptual div that follows them. Ditto for bibliographic elements appearing after the end of a chapter div. This means that such elements should first be moved out of their containing divs, making them div-liminal, and then the chapter-splitting code needs to take account of them. In addition, any page break preceding a chapter div which starts in the middle of a page must be harvested, and copied (with a special @type attribute yet to be decided) to the head of the chapter file being created. The existence of this will enable us not only to link to the page image, but also to determine that the chapter does not start at the top of the page, and signal this in some way in the rendered XHTML.
With the exception of the pb/@type attribute change, I've made all the changes necessary to the schema to accommodate these decisions, and we've added Schematron to stay on top of how they're supposed to function.
I decided to generate not just the linkGrp structures but a fully-expanded list of lists of variants, each with a link to the actual document from which it comes, constructed from its title, as the body of the variant spelling output. This saves additional steps in downstream processing. That's now working and valid.
I needed a much cleaner and more robust set of variant spelling data, not only for the gazetteer, but also for the provision of variant spellings with links in the location files themselves. I've now completely rewritten that code, and extracted it from the xml_create_generated_master.xsl file, so that it can run separately first, and its output can be used by the latter process. I had to do some schema-tweaking to make it valid, because all our usage of <linkGrp>
and <link>
up to now has been for a single purpose, and highly-constrained, but this is more generic.
JT and I have worked out a plan for handling the Stow 1598 in such a way that we can track the publication status of separate chapters, and create standalone versions of those chapters in XML and XHTML. This will enable us to handle peer-review and publication on a chapter-by-chapter basis. Here's how it works:
- Each identified chapter will constitute a distinct
<div>
with an@xml:id
. - In the
<teiHeader>
of the Stow document, the<revisionDesc>
element will contain a<listChange>
element; before the existing change elements, its first child will be a<listChange>
element with@xml:id="stow_1598_chapter_status"
. - Inside this listChange will be a single change element corresponding to each chapter:
<listChange xml:id="stow_1598_chapter_status"> <change xml:id="stow_1598_cripplegate_ward_status" when="2016-04-12" who="mol:DUNC3" status="draft"/> <change xml:id="stow_1598_breadstreet_ward_status" when="2016-04-10" who="mol:LAND2" status="final"/> [...] </listChange>
- Each individual chapter
<div>
has a@change
attribute pointing to its corresponding<change>
element:<div xml:id="stow_1598_cripplegate_ward" change="stow_1598_cripplegate_ward_status"> [...] </div>
- When we process the overall file to create each of the individual chapter files, we take the
@status
attribute from the corresponding<change>
element in the header and give its value to the<revisionDesc>
/@status
value in the header of the chapter file. - The
<listChange>
element in the header can be processed as a key to split out the individual chapters, and also to generate the modern table of contents page. - When an encoder or editor determines that a chapter has reached a new stage, that person updates the corresponding
<change>
element in the header to specify the new status and date.
I've update the ODD and regenerated the schema to allow for this.
Fixes for problems with items duplicated in the XHTML output, due to being in the body as well as in the header, mostly in XIncluded documents. Output is currently still invalid due to namespace issues arising from date code, which JT is fixing.
I've moved the generation of mdt category files back to the original XML collection, and tweaked accordingly; I've also made a number of changes to the schema linking in all of those files (incorporating the rng file as a Schematron source for additional checks), and now all the original XML stuff is validating correctly, as is the standalone which is generated from it.
I decided that it would benefit us to process our source encoding into something cleaner for the "original XML" folder, expanding XIncludes and standardizing relative paths to schemas, so I've done that; the standalone is now based off the original folder. Having done that, I'm now back to building the gazetteer XML content, and working on the index of variants to document lists which will be harvested for the variant set at the top of each location page. It's basically working, but I'm still wrestling a bit with problems of whitespace affecting validity. I think I've solved (for the moment at least) the problem of creating a unique id for each variant spelling, based on the spelling itself.
After some thought, I've decided on the following strategy for handling the gazetteer, and implemented some of it:
- We generate TEI files for each of the "letters" of the gazetteer alphabet, giving us a source file for each of those pages. That's working, along with generation of the JSON. These files require specific markup practices which the schema now allows for, and use two new private URI schemes.
- These files are put into the site/xml/original folder.
- They are validated there along with all the others.
- Validation will require some tweaking to the copied originals and these new ones, to do a couple of things: expand the XIncludes, and tweak the schema PIs so that they point to the correct location.
- From these files, we generate another file which contains a list of all the locations, and for each location, a list of the variant spellings, and for each variant, a list of links to the documents that use it. This will be used as the basis for providing the variant spelling collection at the head of each location file, along with appendix items which can be turned into popups showing links to all the documents.
- Gazetteer XHTML5 pages will be generated in the normal way from the "original" source files, once those have been converted to standalones. This will require addition of specific templates to match the tabular encoding and particular URI schemes (molagas: and molvariant:) which I've introduced as part of this construction.