Category: Activity log

27/08/12

Permalink 04:11:58 pm, by mholmes, 50 words, 130 views   English (CA)
Categories: Activity log; Mins. worked: 120

CSS error correction

Generated a single-file corpus from the collection, and ran css.xsl on it to generate a "stylesheet" which could be validated. There were over 300 errors, so I've been working through them, fixing typos and other problems with CSS in @rend attributes. I've got about half of them done so far.

Permalink 02:37:01 pm, by mholmes, 64 words, 115 views   English (CA)
Categories: Activity log; Mins. worked: 60

Fixed search bug (and some encoding issues)

Fixed the search bug that was returning multiple copies of the same search hit in the results; it was caused by failing to take account of cases where there were multiple search hits with the same parent. Also found a bunch of bad CSS values in @rend attributes and fixed them. I need to do a formal search through the whole corpus for these.

Permalink 01:10:45 pm, by mholmes, 63 words, 103 views   English (CA)
Categories: Activity log; Mins. worked: 60

Fixed a search bug; one more outstanding

I noticed the other day that when you clicked on a search hit in the search results, the link took you to the relevant document, but not to the specific hit you clicked on. I've now fixed that, but another one persists; for some hits in some documents, the same hit is being returned multiple times in the results. Working on that now...

08/08/12

Permalink 03:55:14 pm, by mholmes, 293 words, 104 views   English (CA)
Categories: Activity log; Mins. worked: 180

Formalizing our text categorization

We're currently using a rather messy textual classification method based on the use of <textClass> and <classCode> pointing at a non-existent scheme, and what's more, our classification codes seem to overlap a bit, and fall into two distinct classes. I think it's time to revisit this aspect of our encoding, and put it on a sound formal basis. To that end, I have:

  • Created a new file in /mariage/ called global_metadata.xml, in which we can centralize a variety of metadata and link to it (this should include thinks such as availability/licensing, eventually).
  • Modified the ODD file and generated a new schema to allow for the creation of taxonomies. In the process, I also fixed the oddity whereby <revisionDesc>/@status was only able to be set to "proofing". We now have a set of document status values which I think will be more useful.
  • Created an initial taxonomy of textual types which matches what we currently have.
  • Summarized the issue for CC and asked for guidance on how to continue.

I think we need two separate taxonomies, one for text types and one for content types (e.g. prose vs religion). Then we can add any number of <textClass> elements to any given document, pointing at the specific scheme and code, and use these to filter documents in specialist TOCs and in the search interface.

We should also presumably look for any existing applicable taxonomies that we could adopt.

This arises out of my preparation of the documents for submission to the TAPAS project, which required some standardization of data in the headers. I also removed the pointless "An Electronic Edition" subtitle from all our documents, and tweaked a couple of other things.

07/08/12

Permalink 05:31:26 pm, by mholmes, 27 words, 109 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 120

Meeting to go over grant app and presentation

Met with CC to discuss the grant application and the TRUTH presentation in September, and also fixed a couple of things in the db (publishing Le Blanc).

03/07/12

Permalink 04:30:10 pm, by mholmes, 24 words, 180 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 60

Meeting to review application plans

Met with CC to go over plans for the application, and tweak the French translation of the technical description we wrote the other week.

20/06/12

Permalink 11:21:49 am, by mholmes, 85 words, 185 views   English (CA)
Categories: Activity log, Academic; Mins. worked: 60

Meeting to write project description

Met with CC to write a preliminary draft of a section of the grant application dealing with the proposed normalization and search functionality. This was a useful exercise, forcing me to make all the details explicit, and explain them in clearer terms than I have been doing to myself. The plan still looks good, and I'm looking forward to making more detailed plans based on this (especially plans for the creation of normalization rules, and an automated system for testing them and evaluating the results.

23/05/12

Permalink 10:15:02 am, by mholmes, 98 words, 112 views   English (CA)
Categories: Activity log; Mins. worked: 60

Revisiting normalization

Tested out Franscriptor.com with some sample text from our content, to see what it's doing and to try to deduce how (it's a black box). It offers to "dissimiler" and "détilder" the text, but it's not clear exactly what that means. This is what I've learned:

  • It does nothing with long s, so that has to be normalized before submission.
  • It expands ligatures such as œ.
  • It does quite a good job with u/v normalization, although it failed with "oeuures".
  • Many anacronistic spellings survive unchanged ("luy", "bastir", "tousjours"), so it's clearly not trying to do modernization.

30/04/12

Permalink 05:07:54 pm, by mholmes, 10 words, 114 views   English (CA)
Categories: Activity log; Mins. worked: 30

GM now working on references

GM is now linking from the Ville-Thierry to existing references.

22/03/12

Permalink 05:44:53 pm, by mholmes, 172 words, 82 views   English (CA)
Categories: Activity log; Mins. worked: 180

More work on normalization

Met with CC and examined some of the outcomes from our rulesets. There's obviously a huge amount of tuning still to do, but it's also clear that before each rule is run, the word needs to be checked against the dictionary in case it's already OK; if it is, then we don't need to keep working on it. I've now implemented that by turning the spell-check dictionary into an XML file which is then indexed with xsl:key (I tried other string-finding methods but they were much slower). The transformation now takes substantially longer than it used to, but it's clearer what's happening. One issue might be archaic forms in the spell-check dictionary, of course.

Another issue is u/v variation. When we change one to the other, we often end up changing it back in a later rule. It seems likely that a better approach would be to change all u/v to another unused symbol, and then write rules based on context for changing that symbol to the appropriate output.

<< Previous Page :: Next Page >>

Mariage

Faut-il se marier? La question de Panurge s’avère incontournable en Occident, surtout à partir de la contre-réforme. Des débuts de la Concile de Trente en 1545 jusqu’à la fin du règne de Louis XIV, la tentative de renouveler le mariage se heurte en France à l’intervention croissante de la monarchie dans cette institution dominée auparavent par l’Église. La rencontre entre ces deux autorités fut tumultueuse mais propice au foisonnement des documents qui font l’objet de ce site : « l’imaginaire nuptial » se compose de divers genres textuels, chacun ayant son caractère propre, mais tous traitant des peurs, des désirs et des fantasmes de plus en plus visibles dans la société d’Ancien Régime grâce aux débats soulevés par la nouvelle problématique de l’union conjugale. L’accent pour le moment est sur les textes et images misogames qui font partie d’un renouveau de la Querelle des femmes pendant les 25 premières années du XVIIe siècle.

Reports

Categories

May 2013
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

XML Feeds