Generated a single-file corpus from the collection, and ran css.xsl on it to generate a "stylesheet" which could be validated. There were over 300 errors, so I've been working through them, fixing typos and other problems with CSS in @rend attributes. I've got about half of them done so far.
Fixed the search bug that was returning multiple copies of the same search hit in the results; it was caused by failing to take account of cases where there were multiple search hits with the same parent. Also found a bunch of bad CSS values in @rend attributes and fixed them. I need to do a formal search through the whole corpus for these.
I noticed the other day that when you clicked on a search hit in the search results, the link took you to the relevant document, but not to the specific hit you clicked on. I've now fixed that, but another one persists; for some hits in some documents, the same hit is being returned multiple times in the results. Working on that now...
We're currently using a rather messy textual classification method based on the use of <textClass> and <classCode> pointing at a non-existent scheme, and what's more, our classification codes seem to overlap a bit, and fall into two distinct classes. I think it's time to revisit this aspect of our encoding, and put it on a sound formal basis. To that end, I have:
global_metadata.xml, in which we can centralize a variety of metadata and link to it (this should include thinks such as availability/licensing, eventually).<revisionDesc>/@status was only able to be set to "proofing". We now have a set of document status values which I think will be more useful. I think we need two separate taxonomies, one for text types and one for content types (e.g. prose vs religion). Then we can add any number of <textClass> elements to any given document, pointing at the specific scheme and code, and use these to filter documents in specialist TOCs and in the search interface.
We should also presumably look for any existing applicable taxonomies that we could adopt.
This arises out of my preparation of the documents for submission to the TAPAS project, which required some standardization of data in the headers. I also removed the pointless "An Electronic Edition" subtitle from all our documents, and tweaked a couple of other things.
Met with CC to discuss the grant application and the TRUTH presentation in September, and also fixed a couple of things in the db (publishing Le Blanc).
Met with CC to go over plans for the application, and tweak the French translation of the technical description we wrote the other week.
Met with CC to write a preliminary draft of a section of the grant application dealing with the proposed normalization and search functionality. This was a useful exercise, forcing me to make all the details explicit, and explain them in clearer terms than I have been doing to myself. The plan still looks good, and I'm looking forward to making more detailed plans based on this (especially plans for the creation of normalization rules, and an automated system for testing them and evaluating the results.
Tested out Franscriptor.com with some sample text from our content, to see what it's doing and to try to deduce how (it's a black box). It offers to "dissimiler" and "détilder" the text, but it's not clear exactly what that means. This is what I've learned:
GM is now linking from the Ville-Thierry to existing references.
Met with CC and examined some of the outcomes from our rulesets. There's obviously a huge amount of tuning still to do, but it's also clear that before each rule is run, the word needs to be checked against the dictionary in case it's already OK; if it is, then we don't need to keep working on it. I've now implemented that by turning the spell-check dictionary into an XML file which is then indexed with xsl:key (I tried other string-finding methods but they were much slower). The transformation now takes substantially longer than it used to, but it's clearer what's happening. One issue might be archaic forms in the spell-check dictionary, of course.
Another issue is u/v variation. When we change one to the other, we often end up changing it back in a later rule. It seems likely that a better approach would be to change all u/v to another unused symbol, and then write rules based on context for changing that symbol to the appropriate output.
Faut-il se marier? La question de Panurge s’avère incontournable en Occident, surtout à partir de la contre-réforme. Des débuts de la Concile de Trente en 1545 jusqu’à la fin du règne de Louis XIV, la tentative de renouveler le mariage se heurte en France à l’intervention croissante de la monarchie dans cette institution dominée auparavent par l’Église. La rencontre entre ces deux autorités fut tumultueuse mais propice au foisonnement des documents qui font l’objet de ce site : « l’imaginaire nuptial » se compose de divers genres textuels, chacun ayant son caractère propre, mais tous traitant des peurs, des désirs et des fantasmes de plus en plus visibles dans la société d’Ancien Régime grâce aux débats soulevés par la nouvelle problématique de l’union conjugale. L’accent pour le moment est sur les textes et images misogames qui font partie d’un renouveau de la Querelle des femmes pendant les 25 premières années du XVIIe siècle.
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| << < | > >> | |||||
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 | |