Mariage didn't have any handling for the legacy URLs of documents which have been out in the wild for a decade. I've now fixed that, with a redirect.xql modelled on the Scancan one. I've also incorporated the same detailed SVN info into the footer that MoEML has.
Processed the old list/item structure of the TOC into a series of linked tables for better compatibility with the new site layout. This was part XSLT, part manually work. There's still a lot to do on the Ville Thierry layout.
Fixed two bugs: gravures when mixed with other docs were not being sorted correctly, and accented characters were not being accounted for in the sort routine.
Fixed bugs with id references in botanical and references, and bug with normalized document fw-links to page numbers.
Many filenames are in English because I created them, and they should be in French. Did a first round of renaming (the simplest ones) and dealt with links and other fallout today; will start tackling the more complicated stuff next week. Also fixed a bug with the eXist app so it now delivers the zipped corpus with the correct content-type.
Debugged, tested and deployed the corpus generation code, with an additional feature which generates separate text corpuses for each genre. New version deployed to eXist. Getting closer...
Added the build target that creates the downloadable zip with a corpus.xml and a primarySource.txt inside, the one being the complete corpus, the other being only the plain-text content of the text of primary source document transcriptions. That now seems to be working OK. In the process I discovered that there are still some issues with missing hashes in the @rendition attributes of <zone>
elements in the image markup docs. Ideally I'll fix that in the original source files and then fix any fallout resulting from it in the static build.
I've fixed the internal linking issues through a two-stage process:
- When our original XML documents are processed to create the "clean" XML for public consumption, links to HTML pages and so on are turned into private URI scheme targets with a site: prefix and a corresponding prefixDef in the header.
- The HTML rendering now handles these links.
I've fixed a lot of individual links throughout the collection too in the process of doing that. The main things remaining are the issue of links between the references files, the naming of those files, and the question of whether we still need to offer the corpus.xml and corpus.txt versions for download.
After working on diagnostics fixes, I see that there are some major changes we need to consider for consistency's sake in the XML:
- The references.xml and botanical.xml files respectively are converted to noms_propres.html and terms_med.html in the output. This is a bit confusing, and it means that links to those specific web pages can't be encoded as normal links to the source XML documents, as they should be. I propose renaming the XML files to match the HTML output, and globally changing all the links throughout the corpus.
- DONE: There are many cases where we want to link to HTML files that are built for the site, but which don't have XML source files (such as toc_gravure.html). These links in the HTML are of course pointed at nothing. I propose that we adopt a prefixDef of site:toc_gravure.html for such links, and dereference it as mariage.uvic.ca/toc_gravure.html. This also applies to schema and ODD links.
- FIXED: There are a couple of cases where the references/noms_propres file links to the botanical, and vice versa. Because we don't expect these links, those elements are not imported into the back matter of the files when the XML is expanded, but the links are converted to local links. In the website context, this doesn't cause a problem; when the target is not there, the JS just gets it by AJAX. But the XML documents are not strictly valid. How to handle this? Ideally we should import that stuff.
- DONE: The normalized texts remove all the forme works, but in cases such as Le Bon Mariage, these contain page numbers with @id attributes which are the targets of links in TOCs etc. Make sure these targets are converted into the marginal page numbers we use instead.
There are TOCs both at the beginning and the end of Le Bon Mariage, and they're a display problem because they were done as special list elements rather than tables. I've re-encoded them all as table[@type='primarySourceToc'], which makes it far simpler to display them properly.