Finished updating the XSLT code, and uploaded new versions of all the image docs to the database. Everything seems to be working OK now.
Category: "Activity log"
Claire reported that when she opened Agresseure, a copy of Correction opened. I checked the files on Endive and my backups, and determined that:
- The files are different on my backups, so prior to January 4, the files were different.
- The file dates on Endive show that both files were saved on January 4.
The most likely explanation is that Claire or France saved the correction.xml file over the top of the agresseure.xml file. I've solved the problem by copying my backup of agresseure.xml over the top of the file on Endive. My backup was from November 29, so any changes made to agresseure.xml after that date will have been lost, but I suspect there weren't any (unless you made some on January 4, in which case they were lost anyway when correction.xml was saved over the top, unfortunately).
Completed 11/01/07: IMT 1.4 should have been released, and there are small changes to the file format which will require that:
- IMT 1.4 be installed on the project computers.
- All image markup files be loaded and saved, to make the format conversion.
- These new versions of files be uploaded into eXist.
- The XSLT which renders the interactive image pages be updated to handle the changes.
Right now, the only change is from attribute name svg:id to plain id. There may be more by the time of release, though.
Included minutes putting this task together.
Created a sample header: sample_header.xml and linked it from the markup documentation page. Updated the sitemap so that plain xml docs like this can be delivered, and tweaked the markup page, adding a link to the sample header. Tag documentation should now be complete, pending any elaboration of the markup by adoption of new tags.
The simplicity of the tag structure leads me to believe we could cut the size of the schema considerably, if we put our minds to it. However, it's a bit early in the project to be removing options we might end up wanting to use.
I've been working on documenting tagsets, and there are two new pages on the site:
The first is the one most relevant/useful for Claire and France. If you print it, it should come out nice and clear (it has a special print stylesheet to remove the site graphics, headers, footers etc.), so you can keep it by you as a reference. It turns out the number of tags you're actually using is very small, and I think I've covered almost all of them.
I'm now working on a sample <teiHeader> with explanatory comments.
Hi there,
I fixed the following problems with Sonnet de Courval:
1. The docTitle element was put inside the body. docTitle has to be contained in a front element, before the body, like this:
<front>
<docTitle>
...
</docTitle>
</front>
<body>
...
</body>
2. There was no div element inside the body. There must be a containing block element inside the body; inline elements or plain text cannot be direct children of the body tag. So I've added a div tag like this:
<body>
<div>
...
</div>
</body>
3. Markup of stanzas was wrong. I see whole stanzas marked up as lines, like this:
<l> v. La Flegmatique : Luy tournera le dos tout au long de la nuict,
L’appellera vilain, lubrique, deshonneste,
Refrongnera le front en luy tournant la teste :
Le mary amoureux fasché de ce refus,
Caresse la servante & veut monter dessus.
La femme devient jalouse et il doit quitter la maison pour un temps.</l>
The <l> tag is a line rather than a stanza; stanzas are <lg>. I think it should appear like this:
<lg>
<l>v. La Flegmatique : Luy tournera le dos tout au long de la nuict,</l>
<l>L’appellera vilain, lubrique, deshonneste,</l>
<l>Refrongnera le front en luy tournant la teste :</l>
<l>Le mary amoureux fasché de ce refus,</l>
<l>Caresse la servante & veut monter dessus.</l>
<l>La femme devient jalouse et il doit quitter la maison pour un temps.</l>
</lg>
4. Plain text appears between page breaks, like this:
<pb n="2"/>Pourquoi vouloir nous emprisonner? Pire que le joug des forçats.
<pb n="3"/>Notre paradis devient un enfer. La pire des conditions.
<pb n="3"/>
That's not allowed by the schema; text must be in a container of some kind, such as a paragraph (p). I've supplied p tags in these cases.
I've commented out all the unmarked or partially-marked-up text, in order to get the document to validate, which it does now. When you're working on a large document, I'd recommend that you work this way:
1. Comment out all the text that you haven't marked up, except for the small section you're working on. Work on one paragraph or one stanza at a time.
2. When you've finished marking a section, validate the document. If it won't validate, it's best to fix the problem immediately; if you continue, you'll just store up more problems for yourself. Validate each section before moving on.
3. When the document validates, un-comment the next small section, and work on that.
I've found, after years of doing XML markup, that this is by far the best way to proceed. If you do a lot of work without doing any validation, the chances are you'll spend hours trying to figure out what the validation problems are at the end, and you may have to re-do a lot of your work (for instance, if you made the same sort of mistake several times).
I've also removed this document from the database, because it's not fully marked up yet.
...as requested.
By next Thursday, I should have a code review done, and we can start elaborating some markup guidelines. I'll also take the Allard down from the site -- I was posting as many documents as I could find to see how the titles and headings were working. I'll try to figure out what the problem is with Sonnet de Courval too.
Markup of text files has made use both of formal docTitle tags and of head tags inside divs to represent titles; this may or may not be consistent (it will depend really on the type of document that's being marked up), but there was some concern that documents using head instead of docTitle were not having their titles rendered.
Checked this out with the newly-edited XML files, and it seems that both types of markup are working; where there is a docTitle tag, it is rendered as a separate block at the beginning of the document, and where there are head tags in divs, these are rendered as headings in the main block of the document. This seems to me a reasonable state of affairs -- some documents have both docTitle and head elements, and it makes sense to separate them. Pending a review of encoding practices, which will lead to a set of formal guidelines, I would say that the current behaviour on the site is acceptable. If you think there should be changes, please comment in response to this.
Examples:
- Using both docTitle and head:
La Gazette - Using only docTitle:
ARREST CONTRE LES CHASTREZ (this document has a titlePart, but it isn't marked as type="main", so it doesn't appear bold/large. Why?) Using only head tags:
La Bourgeoise d’esprit
All the original data files were created using a schema from the Image Markup Tool, which was named for the IMT version which created it. This is now obsolete, with a new version of the IMT which slightly changes the schema, so I updated all the data as follows:
- Created a new schema from the current version of the IMT, but named it mariage.xsd so that future changes don't involve editing XML files.
- Edited all the text markup files, changing the schema link to point at the new schema. Validated all the files.
- Changed all the image markup xml files to point at the new schema, and made changes so that they would validate.
- Created a new web-sized version of the image for the paris.xml file (new, partly-done markup).
- Uploaded all the image files to the eXist db and tested them. Some changes to the XSLT were necessary to make the new format work; edited imt_p5_1_to_xhtml.xsl (this filename should also probably change in the future).
- Uploaded new text xml files into eXist and tested them.
All files now use the same schema, called mariage.xsd, and all validate. Made the initial changes on ENDIVE so Claire and France will pick up the changes when they come down to work on Thursday.