My plan is to store the data for CSS files in the eXist db in the form of xsl xsl:attribute-sets. This presents a number of challenges, and I got to work on one of them today.
This approach is useful because it enables us to store CSS data in a highly-structured format, so that we can
read and write individual properties and values in the database; thus we can allow the user to
customize the layout and appearance of documents through an browser-based GUI, and use the
results to supply CSS for the site.
The first problem we have is that a we have very limited possibilities for storing the details of
the CSS selector, and selectors can be quite complicated. All we have, really, is the attribute-set's name attribute, which is a QName, and for our purposes, is actually an NCName. (We could consider using the namespace prefix as another place to store information, but strictly speaking that would be abuse). Therefore we need to find a way to encode a complex CSS selector in the form of an NCName.
At the very least, we need to handle element names, spaces which separate them in a descendant selector, commas which separate them in a multiple selector, class names, and the periods that separate class names from element names. It would also be good to be able to encode the right-angle-bracket used for child-of. Ideally, we would be able to use use all of the characters allowed in CSS.
Since we know the name of XHTML elements, and we ourselves have control over the naming of classes etc. in our project, we don't need to worry about naming collisions as long as we're careful. All QNames must start with a letter or an underscore; this slight limitation suggests that we should use a known prefix for all of them, so that we can strip that off, and therefore be limited only by the restrictions in characters which apply to NCName NameChars. NameChars consist of:
Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
Obvious decisions are to use a period for a period, and an underscore for a space; we should also avoid the colon, because of possible confusion with the namespace separator. Digits probably make a poor choice for encoding anything except digits, because they can occur in various positions in CSS selectors. First, I considered using the extenders as substitute characters. Extenders are:
#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]
\u00b7, middle dot
\u02d0, modifier letter triangular colon
\u02d1, modifier letter half triangular colon
\u0387, Greek ano teleia
\u0640, Arabic tatweel
\u0E46 : THAI CHARACTER MAIYAMOK
... and a bunch of Japanese and Chinese characters. These are not so useful, really. The first three might be, but the first and third would be hard to distinguish anyway.
So I fell back on using non-English letters for substitutions.
This is the list of non-letter characters we need to cover (based on the CSS2 and CSS3 selectors from
http://www.w3.org/TR/REC-CSS2/selector.html and
http://www.w3.org/TR/css3-selectors/
respectively):
* [ ] = " ~ $ ^ | - ( ) : . # [space] > + ,
The period and the dash are acceptable in an NCName; only the following will need to be substituted. These are suggested substitutions. They're meaningless, and not human-readable, but there's not much we can do about that.
* ø U+00F8 : LATIN SMALL LETTER O WITH STROKE
[ Ƹ U+01B8 : LATIN CAPITAL LETTER EZH REVERSED
] Ʒ U+01B7 : LATIN CAPITAL LETTER EZH
= ŧ U+0167 : LATIN SMALL LETTER T WITH STROKE
" ü U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
~ ñ U+00F1 : LATIN SMALL LETTER N WITH TILDE
$ ß U+00DF : LATIN SMALL LETTER SHARP S
^ ê U+00EA : LATIN SMALL LETTER E WITH CIRCUMFLEX
| İ U+0130 : LATIN CAPITAL LETTER I WITH DOT ABOVE
( ʃ U+0283 : LATIN SMALL LETTER ESH
) ʅ U+0285 : LATIN SMALL LETTER SQUAT REVERSED ESH
: ʘ U+0298 : LATIN LETTER BILABIAL CLICK
# Ħ U+0126 : LATIN CAPITAL LETTER H WITH STROKE
[space] _ regular underscore
> ʌ U+028C : LATIN SMALL LETTER TURNED V
+ Ɨ U+0197 : LATIN CAPITAL LETTER I WITH STROKE
, ɹ U+0279 : LATIN SMALL LETTER TURNED R
This is the sequence:
øƸƷŧüñßêİʃʅʘĦ_ʌƗɹ
I wrote templates which function as converters between the name and selector forms, and wrote a test package to make sure they work. Then I wrote a template for outputting an <xsl:attribute-set> node in the form of a CSS ruleset. This also seems to work fine, according to my testing. The next stage is to try executing all of the tests on the server under Cocoon, with the data in eXist.
There's no need for a template converting a CSS ruleset to an attribute-set, because the user will edit the properties and values in a GUI, and XUpdate will be used to make changes to the documents in the eXist db.