Built an installer for IMT 1.5 and installed it on Endive and Chicory. Seems to work OK. I opened and re-saved all the old files using it, in order to get rid of the old appDesc
blocks, and then re-edited them to change the schema filename to mariage.rng
. In future, it shouldn't be necessary for any version of the IMT to mess with schema processing-instructions, so future upgrades should be less problematic.
Category: "Activity log"
Since the IMT is being converted so that it uses a RelaxNG schema instead of an XSD schema, this project also needs to be converted. I took out all the XSD and XSI namespace and schema references, and also added the Oasis and oXygen processing-instruction links to a new mariage.rng schema.
Moved these changes, along with the new schema (which is a single standalone file, although large) back onto the project account. Validated all the files, fixing some errors in paris.xml (which had line elements in a paragraphs instead of in line-groups), with the exception of Sonnet de Courval. Uploaded all the files into the database.
Now, before any further work is done on the image markup files, I need to make sure there's a working installation of IMT 1.5 on the two work machines.
France has been translating the markup documentation, and I've now changed the menu item which points to it to show "Encodage", and replaced the English text of the page with her translation as far as she's got.
We're in a dilemma as to whether to server XHTML documents as application/xhtml+xml
, which is correct, or text/html
, which is wrong, but more widely supported. This morning I did some testing on various browsers to see what woks and what doesn't:
MIME type application/xhtml+xml
Browser | IMT pages | Text pages | Search |
Firefox 2 | OK | No BG image | OK |
Opera 9 | OK | Note link bounces to bottom (target: pseudo-class not supported) | Fails (nothing shows up, no errors) |
Safari | Popups work, but no annotation menu | Links don't work | Links don't work |
MIME type text/html
Browser | IMT pages | Text pages | Search |
Firefox 2 | OK | OK | OK |
Opera 9 | OK | Note link bounces to bottom (target: pseudo-class not supported) | Fails (nothing shows up, no errors) |
Safari | OK | OK | Links don't work |
The results suggest that we really have no choice about using text/html
, and even there, we have some problems with AJAX in Safari and Opera which need to be addressed.
Finally got the thumbnails working: had to use negative margins on the enclosing div, rather than on the image itself. Tweaked the code a little more to get rid of duplicate returns, and then added functionality so that when you click on a hit which is inside an annotation area, you go directly to the image, with that annotation popping up for you.
Then I worked on getting the category selection list going. It was a bit complicated because the distinct-values we want to test on are attribute values, but the rendered option text we need for the drop-down is the content of a child element. In the end I got it working, so the select element is there on the search page, but at the moment it does nothing. It needs to be hooked up to a block of code which will restrict the text search based on the category, or, if there's no search text, will restrict the document list to those which have annotations in that category.
It would be useful to add a new drop-down list to the search screen enabling the user to choose to search only within specific annotation categories. (Obviously this search would be limited to images). One great advantage of this would be that we could retrieve sets of annotations, and it would be possible also to pull back thumbnails of the images themselves, so you could look at (for instance) all chickens appearing in the images.
I've spent a little time thinking about this, and there are a few issues to decide before we can plan it properly:
- Building the select drop-down itself should be straightforward (the same as the other drop-downs).
- Building the XQuery clause to do this search is not so simple, though; we should shell out to a separate function to do it, as we do with the text search.
- What we need to pull back is going to be complicated too, especially if it's combined with a text search. It's not exactly clear what the user might intend by searching for "chat" (text) and "Graveur" (annotation category). Does this mean that she's searching for all instances of "chat" that appear in an annotation which is in the Graveur category? Or does it perhaps mean all documents which have annotations in the Graveur category, which also contain "chat"?
- One approach is to say that if a category is chosen, then the search is restricted to hits within annotations in that category; and what will be retrieved, for each document, is a reduced document which includes only the
<rect>
and<div>
elements for the annotations containing hits; then the XSLT code can process only those annotations, and for each annotation, produce a thumbnail view of the image area using the web-sized version of the image, computing the view rectangle from the original image size and the annotation coordinates. - Such an approach could be usefully extended if there were two controls: one checkbox for "Search only in image annotations", which in turn enables the other, "Restrict search to this category of annotation only". If the first checkbox is checked, whatever the search, what is retrieved and displayed would be shown in a thumbnail view.
- Another option is to say that whenever a hit occurs within an annotation, the XSLT should be intelligent enough to notice that, and render the hit with a thumbnail. This is something we can implement first, and get working, before adding other controls on the page.
On balance, then, we should probably proceed by implementing the thumbnail display in XSLT first, then when that's happy, we should add the filter controls to the search page, and implement the XQuery.
Posting time spent figuring this out...
Newer Image Markup files were lacking metadata and publication statements; older ones had metadata in unstructured <bibl> tags. We trawled through all of them and added proper <bibStruct> tags, with all the information we could gather. There are still lots of gaps, some temporary (where info is available either in notes or from the BN) or permanent (where, for instance, publication date is not known).
Choix d'encodage
Les documents dans cette base de données sont balisés en format "Text-Encoding Initiative's P5". Voir la page de schémas plus de plus amples détails. Tous les documents utilisent le même schéma, "mariage.xsd" (relié à d'autres filières)
Types de documents
Les documents dans la base de données font parties des catégories suivantes:
1. Images balisées
Les images sont balisées à l'aide de "Image Markup tool" avec le <tei Header>, les conventions normales comme décrites ci-dessous sont appliquées; Le "Image Markup Tool" ajoute toute autre information par lui-même dès que les données sont sauvegardées ou téléchargées, ainsi les informations additionnelles apparaîtront dans le <tei Header> de ces données. À l'intérieur même de ces annotations (lesquelles derrière les scènes sont étiquetées <div>), normal block-elements les éléments normaux "block-level" sont utilisés (<p>, <lg> et ainsi de suite/ etc.), et sous ce niveau des balises les descriptions sont comme suit ci-dessous.
2. Textes autonomes
Les textes autonomes sont des documents tel que des brochures, qui ne font pas partie d’un plus grand texte telle qu’une anthologie. /which do not form part of a larger text/ Ces documents sont normalement balisés en utilisant les structures suivantes :
#
<TEI> <teiHeader></teiHeader> <text> <front> <docTitle> <titlePart type="main">[Document title]</titlePart> </docTitle> <docAuthor> <name>[Document Author]</name> </docAuthor> </front> <body> <div>[Main body of the text]</div> </body> <back>[Optional back matter]</back> </text> </TEI>
#
3. Textes tirés d’une source de documents plus grands
Quelques textes
Fixed it: had to use <xsl:sort select="upper-case(tei:title)" />
Most of the image markup files were lacking <classCode scheme="mariage">gravure</classCode>
elements; added those. For existing elements, the original English versions of the codes needed to be translated, so I did that for all the files and re-uploaded them into the database.