I'm marking this task as completed, because we have identified most of the problem areas. We now need to progress to creating proper markup guidelines.
I need to work through the existing markup and look at how some key items are handled:
- sourceDesc bibliographical info (should be formal, probably using biblStruct)
- titles and headings
- pagination/page breaks
- major divisions (are there any nested divs?)
- other line-level tagging practices
Once that's done, we can progress to the creation of some formal guidelines.
Markup of text files has made use both of formal docTitle tags and of head tags inside divs to represent titles; this may or may not be consistent (it will depend really on the type of document that's being marked up), but there was some concern that documents using head instead of docTitle were not having their titles rendered.
Checked this out with the newly-edited XML files, and it seems that both types of markup are working; where there is a docTitle tag, it is rendered as a separate block at the beginning of the document, and where there are head tags in divs, these are rendered as headings in the main block of the document. This seems to me a reasonable state of affairs -- some documents have both docTitle and head elements, and it makes sense to separate them. Pending a review of encoding practices, which will lead to a set of formal guidelines, I would say that the current behaviour on the site is acceptable. If you think there should be changes, please comment in response to this.
Examples:
- Using both docTitle and head:
La Gazette - Using only docTitle:
ARREST CONTRE LES CHASTREZ (this document has a titlePart, but it isn't marked as type="main", so it doesn't appear bold/large. Why?) Using only head tags:
La Bourgeoise d’esprit
All the original data files were created using a schema from the Image Markup Tool, which was named for the IMT version which created it. This is now obsolete, with a new version of the IMT which slightly changes the schema, so I updated all the data as follows:
- Created a new schema from the current version of the IMT, but named it mariage.xsd so that future changes don't involve editing XML files.
- Edited all the text markup files, changing the schema link to point at the new schema. Validated all the files.
- Changed all the image markup xml files to point at the new schema, and made changes so that they would validate.
- Created a new web-sized version of the image for the paris.xml file (new, partly-done markup).
- Uploaded all the image files to the eXist db and tested them. Some changes to the XSLT were necessary to make the new format work; edited imt_p5_1_to_xhtml.xsl (this filename should also probably change in the future).
- Uploaded new text xml files into eXist and tested them.
All files now use the same schema, called mariage.xsd, and all validate. Made the initial changes on ENDIVE so Claire and France will pick up the changes when they come down to work on Thursday.
Nov 23: I removed all of the div0 from the xml texts.
Polished off some tasks:
- Updated the IMT on both markup computers.
- Added France as a user to the blog.
- Tested Claire's and France's logins.
- Added a link from the project site menu to the blog.
- Created an inc file for the project, for hooking into the HCMC site, and sent the location to Stew.
- Uploaded one changed document to the eXist db and confirmed it's working OK ("Stances à une femme mariée").
J'ai commencé à 3h30. J'ai vérifié et tout est bien validé et sauvegardé.
à jeudi prochain! Je suis partie à 4h30.
I started working at 3h30. The text "Varin" is done, validated and saved.
We might have to look carefully at adding more details within the notes.
See you next Thursday!
France :)
- Add Francela to the users.
- Make sure C and F can log in and post, and change their pws.
- Add a link from the project Website to the blog.
- Create an inc file for the project, for linking into the HCMC Website.
The Mariage blog has been set up, and Claire Carlin added as a user.