Category: Activity log


Permalink 12:07:46 pm, by mholmes, 27 words, 17 views   English (CA)
Categories: Activity log; Mins. worked: 180

Corpus stuff finished

Debugged, tested and deployed the corpus generation code, with an additional feature which generates separate text corpuses for each genre. New version deployed to eXist. Getting closer...


Permalink 02:52:07 pm, by mholmes, 95 words, 18 views   English (CA)
Categories: Activity log; Mins. worked: 180

Wrote downloadable-corpus generation code

Added the build target that creates the downloadable zip with a corpus.xml and a primarySource.txt inside, the one being the complete corpus, the other being only the plain-text content of the text of primary source document transcriptions. That now seems to be working OK. In the process I discovered that there are still some issues with missing hashes in the @rendition attributes of <zone> elements in the image markup docs. Ideally I'll fix that in the original source files and then fix any fallout resulting from it in the static build.


Permalink 04:32:08 pm, by mholmes, 111 words, 21 views   English (CA)
Categories: Activity log; Mins. worked: 180

Fixed internal linking issues

I've fixed the internal linking issues through a two-stage process:

  • When our original XML documents are processed to create the "clean" XML for public consumption, links to HTML pages and so on are turned into private URI scheme targets with a site: prefix and a corresponding prefixDef in the header.
  • The HTML rendering now handles these links.

I've fixed a lot of individual links throughout the collection too in the process of doing that. The main things remaining are the issue of links between the references files, the naming of those files, and the question of whether we still need to offer the corpus.xml and corpus.txt versions for download.

Permalink 11:27:03 am, by mholmes, 307 words, 22 views   English (CA)
Categories: Activity log, Tasks; Mins. worked: 180

Stuff to fix and/or discuss for the static build/new site

After working on diagnostics fixes, I see that there are some major changes we need to consider for consistency's sake in the XML:

  • The references.xml and botanical.xml files respectively are converted to noms_propres.html and terms_med.html in the output. This is a bit confusing, and it means that links to those specific web pages can't be encoded as normal links to the source XML documents, as they should be. I propose renaming the XML files to match the HTML output, and globally changing all the links throughout the corpus.
  • DONE: There are many cases where we want to link to HTML files that are built for the site, but which don't have XML source files (such as toc_gravure.html). These links in the HTML are of course pointed at nothing. I propose that we adopt a prefixDef of site:toc_gravure.html for such links, and dereference it as This also applies to schema and ODD links.
  • FIXED: There are a couple of cases where the references/noms_propres file links to the botanical, and vice versa. Because we don't expect these links, those elements are not imported into the back matter of the files when the XML is expanded, but the links are converted to local links. In the website context, this doesn't cause a problem; when the target is not there, the JS just gets it by AJAX. But the XML documents are not strictly valid. How to handle this? Ideally we should import that stuff.
  • DONE: The normalized texts remove all the forme works, but in cases such as Le Bon Mariage, these contain page numbers with @id attributes which are the targets of links in TOCs etc. Make sure these targets are converted into the marginal page numbers we use instead.


Permalink 05:03:38 pm, by mholmes, 31 words, 22 views   English (CA)
Categories: Activity log; Mins. worked: 60

HTML fragments are no more

CC rewrote the About page, which prompted me to fix those site pages in their XML format rather than continuing to maintain them in HTML. The whole dataset is now TEI.


Permalink 05:45:18 pm, by mholmes, 62 words, 22 views   English (CA)
Categories: Activity log; Mins. worked: 240

Fixed bugs

Fixed a number of things today:

  • Footnotes are now working even when embedded in split lists. I've added an earlier step which puts their number in @n.
  • The "Tous" TOC now includes gravures, which it didn't before.
  • The dates in TOCs correctly reflect the certainty and ranges of unknown dates.
  • XML files are now displaying.
  • The X-Frame-Option header is set to DENY.


Permalink 05:02:54 pm, by mholmes, 200 words, 11 views   English (CA)
Categories: Activity log; Mins. worked: 180

Half-done on a bug in list handling

There is a notorious problem in converting TEI lists to HTML, whereby if there are embedded things (such as formeworks or page breaks) that will result in element content, you have to split out the list into separate component lists to avoid embedding non-list element content, resulting in invalid HTML. I had some code that was supposed to be doing this with for-each-group, but it wasn't working. I debugged and fixed it, so lists are now coming out OK, but there's now a problem with notes embedded in these structures, because they're being processed as part of a constructed fragment; that means they lose their context, and end up being numbered "1" and not generating a popup correctly. This is exemplified in the TOC for Forest Nuptiale. Possible solutions:

  • Do a first pass to pre-process notes to give them an @n attribute, then do the note processing based on that attribute rather than on counting preceding notes.
  • Do a first pass to split out the lists in XML, so that the problem doesn't arise.
  • Instead of using the very limited ol/ul/li elements in XHTML5, use instead simple divs with display: list and display: list-item.

I'm still thinking on this.


Permalink 03:40:59 pm, by mholmes, 70 words, 58 views   English (CA)
Categories: Activity log; Mins. worked: 180

Refactoring for eXist 3.0

The new 3.0 has a bug with namespaces which can be worked around by refactoring a bit; since the refactoring actually produces better code, I've done it for several projects including Mariage. I've also reworked the search functionality so that it handles the problem case of a large document with hundreds of hits. Other layout and style bugfixes also done, and a couple of obvious things added to the stopword list.


Permalink 03:44:59 pm, by mholmes, 82 words, 62 views   English (CA)
Categories: Activity log; Mins. worked: 120

Generic search functionality work

Made a number of tweaks to the way the search currently works, but principally worked on generic code in the hcmc/xquery/xq-utils.xqm library to convert user-friendly search-box input into the XML syntax that eXist can use to talk to Lucene. This seems to be working well, although I haven't yet found a way to put it into practice because we're still using a string-construct-and-eval approach to filtered queries. It may be just a case of using the XQuery serialize() function.


Permalink 04:35:19 pm, by mholmes, 235 words, 61 views   English (CA)
Categories: Activity log; Mins. worked: 360

More work on search and nuances of search links

As I hack away at search testing, I'm discovering more and more little tweaks that are more than nice-to-have. Today I fixed a bunch of bugs in processing of ambitious search strings (quoted phrases are not supported yet, although I have half-a-plan for that). I also decided that search-string highlighting in a document that you have found is better done using a much simpler search string than the one you used to find documents in the collection (for instance, you don't want minused terms in the document highlighter because it causes eXist to return nothing, for some reason). So I now have a clever conversion of the original search string that is appended to the URL of the document link in the initial search results.

I've also fixed the display of the gravures so that a search result link will pop up the containing annotation, and also so that a link to the id of an element which is not an annotation itself, but is inside one, will cause the annotation to be shown.

We're clearly down to minor tweaks at this stage, so we're close. PS is still working on a couple of cosmetic issues. I'm thinking that there should be some more sophisticated diagnostics to catch broken links; I don't think that check is currently finding links that point to an element in a document which is not one of the ref docs.

<< Previous Page :: Next Page >>


Faut-il se marier? La question de Panurge s’avère incontournable en Occident, surtout à partir de la contre-réforme. Des débuts de la Concile de Trente en 1545 jusqu’à la fin du règne de Louis XIV, la tentative de renouveler le mariage se heurte en France à l’intervention croissante de la monarchie dans cette institution dominée auparavent par l’Église. La rencontre entre ces deux autorités fut tumultueuse mais propice au foisonnement des documents qui font l’objet de ce site : « l’imaginaire nuptial » se compose de divers genres textuels, chacun ayant son caractère propre, mais tous traitant des peurs, des désirs et des fantasmes de plus en plus visibles dans la société d’Ancien Régime grâce aux débats soulevés par la nouvelle problématique de l’union conjugale. L’accent pour le moment est sur les textes et images misogames qui font partie d’un renouveau de la Querelle des femmes pendant les 25 premières années du XVIIe siècle.


XML Feeds