Most of the image markup files were lacking <classCode scheme="mariage">gravure</classCode>
elements; added those. For existing elements, the original English versions of the codes needed to be translated, so I did that for all the files and re-uploaded them into the database.
We still have some work to do on the XHTML mime type issue, and we have to discover once and for all whether it's practical to use it. Proceed thus:
- Remove the meta tag to get a "pure"
application/xml+xhtml
site, and test on various browsers. Make notes of what works and what doesn't. - Replace the meta tags, and change the sitemap to serve pages as
xhtml11_compat
for all browsers (not just IE, as is currently the case). (xhtml11_compat
is my serializer for XHTML 1.1 served astext/html
.) Test again, and note the results. - If we find that things generally work (especially if they work in Safari and Opera), we should probably give up on the use of
application/xml+xhtml
for the moment, unfortunate though this would be. We can always change this later when browsers become more reliable in this regard. Meanwhile, we need a working site. The site can still make use of proper DOM methods (where they're supported) instead of innerHTML.
Once we have a clear idea of what's happening, document it to death, and then set up a system which works for all our core browsers.
A pretty good search page is now online here. However, it only works properly in Firefox and IE. These are the good things about it, most of which represent significant steps forward over previous projects:
- It uses proper DOM methods instead of relying on innerHTML (except when running on IE, which can't handle importNode, among other things).
- It has a working implementation of "keyword-in-context" (KWIC) displays of the search results, written in pure XSLT, rather than relying on the problematic (and proprietary) extension functions in eXist. My XSLT seems to work better than anything that can be achieved using the eXist functions (which we used in the EMLS project).
- Where asearch finds a hit inside a tag with an xml:id attribute, it makes a hash link to that element id in the Web page; in the case of image markup pages, this pops up the annotation containing the hit. This will eventually be extended to include highlighting on the page.
- It does "any word", "all words" and "exact phrase" searching. The first two use eXist's shortcut functions; the last uses xpath contains(), and it isn't appreciably slower.
There are two classes of remaining problem. The first is browser support:
- Opera appears not to insert the results into the page. Although it claims to support importNode, I suspect this may be the root of the problem with Opera. In order to confirm this, I'll need to try using cloneNode instead; that's not the "proper" way to do it, but has been used in the past. If that works, we know the problem is with DOM implementation, and we can try to find a workaround. At worst, we demote Opera to the same status as IE and use innerHTML.
- Safari displays the results OK, but the links inside the results do not function as links. This may be because the imported nodes are not construed as XHTML for some reason. I don't have an angle on this yet, but the problem does not occur with WebKit (the development version of Safari), so it may just go away in time. Nevertheless, we need a workaround.
- The progress bar doesn't seem to show up on any browser (although it may be going by too fast to see).
The second class of problem may be a significant factor in the issues above. I'm still doggedly trying to serve my XHTML 1.1 documents as application/xml+xhtml
instead of as text/html
(although I just noticed that I'm specifying a text/html
in a meta tag, which might override the server settings on some browsers). It may be the case that, even using proper DOM methods, we still don't inherit enough useful functionality from HTML (events and behaviours such as linking) when the pages are served as application/xml+xhtml
, so that making a complex interactive site may be impossible, or horribly difficult, unless we go back to text/html
. To test this, I need to do this:
- Remove the meta tag to get a "pure"
application/xml+xhtml
site, and test on various browsers. Make notes of what works and what doesn't. - Replace the meta tags, and change the sitemap to serve pages as
xhtml11_compat
for all browsers (not just IE, as is currently the case). (xhtml11_compat
is my serializer for XHTML 1.1 served astext/html
.) Test again, and note the results. - If we find that things generally work (especially if they work in Safari and Opera), we should probably give up on the use of
application/xml+xhtml
for the moment, unfortunate though this would be. We can always change this later when browsers become more reliable in this regard. Meanwhile, we need a working site. The site can still make use of proper DOM methods (where they're supported) instead of innerHTML.
The search page work is proceeding slowly. I now have a working retrieval system based on responseXML, but there are two problems at the moment:
- Neither Opera nor IE seem to be able to handle importNode followed by appendChild (the proper DOM methods for pulling a node from the responseXML document into the page), so for those browsers I have to use innerHTML. Urg.
- Safari gives me dreaded "DOM exception 7" error, which presumably can't arise from the same problem, since Safari doesn't allow innerHTML in an XML document, so that will take more investigation.
The determination to do this a) correctly where possible, b) at least effectively where it can't be correct, and c) reliably for the future is proving very messy, but this is a really the project where I'm working out all the bugs for this stuff for future projects, and devising a set of solid strategies for XSLT 2, multiple-namespace documents and application/xsl_xhtml, so I think it's worth the pain.
This menu bug turned out to be another IE-related thing. This is how it works:
- Pure CSS menus depend on the :hover pseudo-attribute.
- IE6 does not support this.
- As a compromise, I had added an onclick event to the Menu header to make the menu show on IE6.
- Whenever you use JavaScript to set a CSS property (display: block, in this case), most browsers cache this property and then retain it when you use the back button to return to the page, so the menu is displayed.
- Switching to pure CSS (removing the onclick attribute) solved the bug on all browsers, but made the menu inaccessible on IE6.
- I fixed this by using browser detection (boo!) to trap for IE6 and assign the onclick handler to the menu just for IE6.
- I also added a :hover for the document body to hide the menu automatically when you move the mouse anywhere outside the menu.
Tested and working on Firefox, IE6, IE7, Opera and Safari.
Completed 19/02/07.
I've just noticed an odd bug with the site menu. If you use the menu to navigate to another page, then use the browser's back button to return, the menu is showing and fixed on the page (in other words, it doesn't disappear when you mouseout). This is a bit odd. The menu is based on Eric Meyer's Pure CSS Menus, which show a much less annoying version of the bug (on his page, as soon as you mouseover something on the page after going back to it, the menus disappear). Figure out why this is happening and fix it.
Worked with Claire to translate the search page into French. France will work on translating the other pages.
There's one remaining issue with this: the text type designation, which is specified in a <classCode>
element in the teiHeader
, currently has values like "prose", "verse" and "mixed". These values are in English. The search page harvests these to populate the "Genre de texte" dropdown selector, so these also need to be converted to French in all the documents in the collection. I'll do this manually. We'll use "gravure", "prose", "vers" and "prose et vers" for the values. This element may be multiple in future, enabling classification of texts in several ways (polemical, religious, etc.).
While working with Claire on the biblStruct elements we need to include in our document headers, I came up against a major problem with P5. Previous versions of P5 (including the one used to generate our Mariage project schema) included an element <dateRange>
, which could be used to encode a range of dates when the precise date was not clear or unknown. However, <dateRange>
has recently been removed from P5. A useful P4 alternative, the notBefore
and notAfter
attributes, has also been removed along with the from
and to
attributes, leaving <date>
with only a dur
attribute with a date type of xsd:duration to fill the gap.
The dur
attribute is not intended for date ranges, and doesn't do the job. I wrote to the TEI list about this, and immediately got a phone call from Syd Bauman. After a long discussion, I wrote to the list again to propose that from
, to
, notBefore
and notAfter
be revived in P5, since they're clearly exactly what's needed in this case, and it's obviously not a boundary case. I hope we'll get them back. Meanwhile, Claire is using the precision="low"
attribute/value to signal that a date is unsure (although Syd tells me precision
is also gone, or going, from P5).
This has been a major problem for all our XHTML projects. Briefly, the problem is this:
- XHTML documents must be served with the mime type "application/xhtml+xml", because the W3C says so.
- If we do serve documents with this mime type, two problems occur:
- Internet Explorer doesn't display the document; it just offers to save it. Even IE7 has this problem: it was a policy decision by IE's developers.
- Safari quite logically refuses to allow the use of innerHTML when the mime type is not text/html.
I now have a solution to #1: using Cocoon's built-in browser selector in Cocoon's sitemap, we can serve pages to IE with the wrong mime type, but serve the correct mime type to all the other browsers. It works like this:
<map:select type="browser"> <map:when test="explorer"> <!-- for IE, we have to serve text/html :-( --> <map:serialize type="xhtml11_compat" /> </map:when> <map:otherwise> <map:serialize type="xhtml11"/> </map:otherwise> </map:select>
The "xhtml11_compat" serializer is a custom serializer we've added to the root sitemap of Cocoon, which looks like this:
<map:serializer logger="sitemap.serializer.xhtml" mime-type="text/html" name="xhtml11_compat" pool-grow="2" pool-max="64" pool-min="2" src="org.apache.cocoon.serialization.XMLSerializer"> <doctype-public>-//W3C//DTD XHTML 1.1//EN</doctype-public> <doctype-system>http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd</doctype-system> <encoding>UTF-8</encoding> </map:serializer>
This just serializes the document as XML with an XHTML doctype, but with a mime type of text/html. It's a hack, but it's IE's fault, and we're only breaking the rules for IE; other browsers see the correct mime type.
The second problem remains, but I have an angle on it, as reported previously, by using XMLHttpRequest.responseXML and DOM methods to add AJAX-retrieved elements to the page DOM instead of using innerHTML. On that topic, I noticed this quote on the release notes for Firefox Gran Paradiso Alpha 2 (meaning the latest alpha for Firefox 3):
Moving DOM nodes between documents now requires a call to importNode or adoptNode as per the DOM specification.
I need to make sure that I'm testing for the availability of those methods, and using them if they're there, before falling back to cloneNode, appendChild, copyNode or whatever simpler (but incorrect) options are currently supported in browsers such as Firefox 2.
Documents marked up using the IMT have minimal headers at the moment. These need to be expanded, with the addition of two different blocks of information:
The <publicationStmt> element needs to be fleshed out like this:
<publicationStmt><idno>arrest_contre_les_chastez.xml</idno> <publisher>Humanities Media and Computing Centre</publisher> <pubPlace> <address> <addrLine>University of Victoria</addrLine> <addrLine>B.C., Canada</addrLine> </address> </pubPlace> <date value="2004-08-20">20 August 2004</date> <availability status="free"> <p> Copyright 2004. This text is freely available provided the text is distributed with the header information provided. </p> </availability> </publicationStmt>
The second is the sourceDesc, which needs to contain at least one <biblStruct> element, for the original publication, and possibly more, for cases where they have been republished in later anthologies. This is an example:
<biblStruct> <monogr> <title>Sermon pour la consolation des cocus</title> <author><name>anon.<reg>anon.</reg></name></author> <imprint> <pubPlace>Paris</pubPlace> <date value="1624">1624</date> </imprint> </monogr> </biblStruct> <biblStruct> <analytic> <title>Sermon pour la consolation des cocus</title> <author><name>anon.<reg>anon.</reg></name></author> </analytic> <monogr> <title>Le Bibliophile fantaisiste ou choix de pièces désopilantes de rares</title> <imprint> <publisher>J. Gay et fils</publisher> <pubPlace>Turin</pubPlace> <date value="1869">1869</date> <biblScope type="pp">359-70</biblScope> </imprint> </monogr> </biblStruct>
Where dates are vague, we can use the precision attribute to specify certainty, and put a date range in the date tag content:
<date value="1670" precision="low">1650-1680</date>