Finnished off the presentation by creating a couple of diagrams, and then added a block of explanatory text to each slide to act as a "script" for Karin. Used the CSS and scripting hacks we used for CASTA to get a decent printable version of the presentation direct from the XML, incorporating the script stuff. Posted it on the teiJournal site, and handed the printout over to Karin for her to take a look at. We have plenty of time for tweaking before July.
For the PKP conference in July, Karin will have to deliver my presentation on teiJournal, because I'll be away. I started adapting the presentation from Vancouver a few weeks ago to make it more detailed and clear. Did some research on all the existing journal e-publication tools I could find, looking for any others which use XML in the way teiJournal does, but couldn't find any.
There's another day or so of work to do on this presentation.
The book "Programming Firefox" arrived, and the timing was good; I'm considering the possibility of using a XUL-based application for the editors to access the database, because it would allow more flexibility in terms of local file access and storage. I started working through the early chapters, and reading around the later ones to figure out how practical it is to deploy a XUL-based application right now. It seems quite feasible, although the standalone XULRunner app is still a developer preview at this point. I'll keep working through the book, because I can see lots of possibilities for an effective GUI using Firefox/Gecko.
Also spent some time looking for ways to batch-convert my growing collection of SVG GUI icons to PNGs for GUI building in Java and XUL. ImageMagick looked promising, but it does a lousy job of converting SVGs; they come out without colours. The GIMP does a good job of conversion, but isn't batch-scriptable. This needs more investigation.
XSLT 2.0 brings a lot of extra functionality to the table, and teiJournal will take advantage of it. One obvious step forward is the ability to define your own functions. In many contexts, this is much better than the old method of doing things using named templates, because you can call a function from right inside an XPath expression. We've recently needed some string-manipulation functions, and I'm anticipating the need for many more, so I've started building a utilities.xsl library, which will be released under MPL 1.1 if it ever gets released. Today I coded and tested these functions:
mdh:lastIndexOf, a simple but necessary routine that's missing from XPath 2.0 for some reason.mdh:truncate, a function for truncating a string to a particular maximum length and adding an ellipsis; this also has the an option to trim it back to the last preceding space, so that words are not broken in the middle.mdh:stripLeadingArticle, a function for use in sorting and presenting document titles which have leading articles. For instance, this title:
The Island of Dr. Moreau
might be sorted on the word "Island", and presented like this:
Island of Dr. Moreau, The
The function handles this. I coded it based on the JavaTitleSortComparatorI wrote yesterday; that works fine, but this provides another option in contexts where we might not be able to install our own Java classes under Cocoon. It can also be used for formatting the titles for display, which the sort comparator can't do.
There's a lot to learn in XSLT 2.0, but this is coming along nicely. I still have to trawl through all my other XSLT projects to identify other utility code that could be encapsulated in this way.
I now have the basic system working in pilot form. This is what I did today:
- Made a revision to the previous posting, after noticing that I'd omitted the comma from the list of characters which need to be substituted in the selector.
- Created a Java application which can translate CSS files into xsl:attribute-sets. This is probably not going to be required on the server itself, but Greg made the point that if it could be run from the command line, it could be called as a transformer from Cocoon, so I'll probably rewrite it as a command-line class at some stage, and have that class instantiated by the GUI app for its purposes. This took much of the day, but in the process, I learned once more about how to create a Java app, and added a very useful utility class I can use and extend in all my Java stuff.
- Created a complicated xsl:attribute-set file based on a Mariage stylesheet.
- Completed the
xsl_attribute_sets_to_css.xslfile. - Tested to make sure the transformation was working OK locally.
- Uploaded both files into the db.
- After some tweaking of the stylesheet, managed to get the sitemap set up so that it can deliver a CSS file generated from the xsl attribute set file in the db, using the transformation file also stored in the db. For the record, this is how it works:
<map:match pattern="*.css"> <map:generate src="xmldb:exist:///db/teiJournal/styles/test/{1}.xsl" /> <map:transform type="saxon" src="xmldb:exist:///db/teiJournal/xsl_trans/css/attribute_sets_to_css.xsl" /> <map:serialize type="text" /> </map:match>
This is good progress. Now I have to revisit the database structure again, and figure out what needs to be stored where.
My plan is to store the data for CSS files in the eXist db in the form of xsl xsl:attribute-sets. This presents a number of challenges, and I got to work on one of them today.
This approach is useful because it enables us to store CSS data in a highly-structured format, so that we can read and write individual properties and values in the database; thus we can allow the user to customize the layout and appearance of documents through an browser-based GUI, and use the results to supply CSS for the site.
The first problem we have is that a we have very limited possibilities for storing the details of the CSS selector, and selectors can be quite complicated. All we have, really, is the attribute-set's name attribute, which is a QName, and for our purposes, is actually an NCName. (We could consider using the namespace prefix as another place to store information, but strictly speaking that would be abuse). Therefore we need to find a way to encode a complex CSS selector in the form of an NCName.
At the very least, we need to handle element names, spaces which separate them in a descendant selector, commas which separate them in a multiple selector, class names, and the periods that separate class names from element names. It would also be good to be able to encode the right-angle-bracket used for child-of. Ideally, we would be able to use use all of the characters allowed in CSS.
Since we know the name of XHTML elements, and we ourselves have control over the naming of classes etc. in our project, we don't need to worry about naming collisions as long as we're careful. All QNames must start with a letter or an underscore; this slight limitation suggests that we should use a known prefix for all of them, so that we can strip that off, and therefore be limited only by the restrictions in characters which apply to NCName NameChars. NameChars consist of:
Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
Obvious decisions are to use a period for a period, and an underscore for a space; we should also avoid the colon, because of possible confusion with the namespace separator. Digits probably make a poor choice for encoding anything except digits, because they can occur in various positions in CSS selectors. First, I considered using the extenders as substitute characters. Extenders are:
#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE] \u00b7, middle dot \u02d0, modifier letter triangular colon \u02d1, modifier letter half triangular colon \u0387, Greek ano teleia \u0640, Arabic tatweel \u0E46 : THAI CHARACTER MAIYAMOK
... and a bunch of Japanese and Chinese characters. These are not so useful, really. The first three might be, but the first and third would be hard to distinguish anyway.
So I fell back on using non-English letters for substitutions.
This is the list of non-letter characters we need to cover (based on the CSS2 and CSS3 selectors from
http://www.w3.org/TR/REC-CSS2/selector.html and
http://www.w3.org/TR/css3-selectors/
respectively):
* [ ] = " ~ $ ^ | - ( ) : . # [space] > + ,
The period and the dash are acceptable in an NCName; only the following will need to be substituted. These are suggested substitutions. They're meaningless, and not human-readable, but there's not much we can do about that.
* ø U+00F8 : LATIN SMALL LETTER O WITH STROKE [ Ƹ U+01B8 : LATIN CAPITAL LETTER EZH REVERSED ] Ʒ U+01B7 : LATIN CAPITAL LETTER EZH = ŧ U+0167 : LATIN SMALL LETTER T WITH STROKE " ü U+00FC : LATIN SMALL LETTER U WITH DIAERESIS ~ ñ U+00F1 : LATIN SMALL LETTER N WITH TILDE $ ß U+00DF : LATIN SMALL LETTER SHARP S ^ ê U+00EA : LATIN SMALL LETTER E WITH CIRCUMFLEX | İ U+0130 : LATIN CAPITAL LETTER I WITH DOT ABOVE ( ʃ U+0283 : LATIN SMALL LETTER ESH ) ʅ U+0285 : LATIN SMALL LETTER SQUAT REVERSED ESH : ʘ U+0298 : LATIN LETTER BILABIAL CLICK # Ħ U+0126 : LATIN CAPITAL LETTER H WITH STROKE [space] _ regular underscore > ʌ U+028C : LATIN SMALL LETTER TURNED V + Ɨ U+0197 : LATIN CAPITAL LETTER I WITH STROKE , ɹ U+0279 : LATIN SMALL LETTER TURNED R
This is the sequence:
øƸƷŧüñßêİʃʅʘĦ_ʌƗɹ
I wrote templates which function as converters between the name and selector forms, and wrote a test package to make sure they work. Then I wrote a template for outputting an <xsl:attribute-set> node in the form of a CSS ruleset. This also seems to work fine, according to my testing. The next stage is to try executing all of the tests on the server under Cocoon, with the data in eXist.
There's no need for a template converting a CSS ruleset to an attribute-set, because the user will edit the properties and values in a GUI, and XUpdate will be used to make changes to the documents in the eXist db.
Recent work on the Mariage project has shown the most effective way to approach user-switching between styles on the site: we should use Cocoon's built-in cookie selector to set the path to XSLT and CSS files used. The exact details of this need to be worked out, though; given the limitations we have in doing includes in XSLT files (see previous postings), we won't have complete freedom to mix and match components; we may have to depend on relative paths to do that for us, so XSLT stylesheets may have to be organized into subcollections.
It turns out that the simplest way to do this is with XInclude. This is how it works:
Both XSLT files are uploaded into the database (with "Expand XIncludes turned OFF in the client -- this is important for reasons that will be clear later). The root XSLT file has an XInclude inside it, pointing to the file that should be included:
<xi:include href="globals.xsl#xpointer(//xsl:stylesheet/*)"></xi:include>
Note: this XInclude code is actually technically incorrect, but it looks this way because the XInclude implementation on the version of eXist we're running right now is non-conformant. This has been fixed in the eXist SVN, and when we upgrade to the next stable eXist version, we should be able to use the correct code, which would look like this:
<xi:include href="globals.xsl" xpointer="xpointer(//xsl:stylesheet/*)"></xi:include>
Now we invoke the transformation in the pipeline like this:
<map:match pattern="text/*.txt">
<map:generate src="xml/{1}.xml" />
<map:transform type="xslt" src="xmldb:exist:///db/teiJournal/xsl_trans/text/text_out.xsl" />
<map:serialize type="text" />
</map:match>
What happens is that Cocoon retrieves the XSLT from the database, and as it does that, eXist expands the XInclude on the fly, and a single stylesheet is created as input to the transformation. Thus the problem of one stylesheet referring to another is avoided.
In the system we envisage, a base transformation would include components from other stylesheets which are under the editable by the admins and editors of the journal, so the contents of the included file will change. This is why it's important not to expand xincludes when uploading the stylesheet in the first place: if we do that, a static copy of the original (unedited) inclusions will be permanently stored in the root stylesheet, so customizations will not have any effect.
Did some more work on the idea of storing XSLT in the database and still using it for transformation. It seems impractical to store the root XSLT file that will be used for the transformation in the database; things work better from the point of view of Cocoon pipelines if that's in the filesystem. However, that's not a major problem, because the kinds of things we want to store in the db are string variables and attribute sets, so they'd most likely be includes anyway. Therefore I've begun testing various ways to import a stylesheet into another stylesheet from the database. This combination of sitemap pipeline and import command works:
<map:match pattern="db/xsl_trans/**.xsl">
<map:generate src="xmldb:exist:///db/teiJournal/xsl_trans/{1}.xsl"/>
<map:serialize type="xml" />
</map:match>
<xsl:import href="http://localhost:8080/cocoon/projects/mholmes/teiJournal/db/xsl_trans/text/globals.xsl" />
The main stylesheet, accessed in the normal way, imports the second stylesheet through a localhost URL, which calls the pipeline, which then transmits the stylesheet from the db as XML.
The obvious disadvantage here is the hard-coded URL; although it's based on localhost, the remaining path structure is too specific. I'm now going to experiment with every variation of relative URL I can think of, to see if I can come up with a workaround.
Spent most of the day working out the DB design (which is half-documented in a diagram, which I'll post when I've finished it). Before going too far with it, though, I needed to check that it was possible to store XSLT in the database and use it in pipelines. This proved a bit complicated.
You can of course store an XSLT file in the db, because it's just XML. Similarly, you can retrieve it using a <map:generate> or <map:read> element (although the latter is no use for our purposes because it can't be used as input in another pipeline). Another pipeline can reference the XSLT file using the xmldb:/// protocol, and this appears to work OK (at least with the default xslt transformer; using Saxon, things don't quite work, but that's another story). However, any <xsl:include> or <xsl:import> elements in the stylesheet fail, because their relative path is reconstructed using the cocoon:/ protocol, and the XSLT transformer of course knows nothing about that.
We really do need to store XSLT files in the db, because all sorts of user preferences and options will be stored in the form of xslt attribute sets. Perhaps we can use XQuery to compile a complete stylesheet? That actually makes a lot of sense -- that'll be the next thing to figure out.