URIs in output are often problematic because they have no spaces which allow text-wrapping to trigger. Added a custom XSLT function to add zero-width spaces after each slash and period, to make text-wrapping feasible for the user agent.
Got some basic layout styles done, and then added the note and reference popup code. The way I've done it means that if JavaScript is turned off, note and reference links simply bounce you to the bottom of the page (or the appropriate place); if JS is on, then the href attribute is removed, and an onclick event pops up the relevant info in the right margin. This is basically working, except for some types of link (internal links to tables, appendices etc.), which need to be looked at.
But we're nearly there for the XHTML!
Document types handled so far are books, journal articles (with and without authors), book chapters, and presentations. This covers everything in the first two documents. Checking the previous entry to see if the name list is the same, and replacing with a dash, actually turns out to be unnecessary; the APA style guide doesn't seem to mention it, and shows examples of multiple items by the same authors with names shown in full (p.220, section 4.04).
Abstracted the regexp period-adding code into an external function:
<xsl:function name="mdh:addPeriodIfNeeded" as="xs:string">
<!-- Incoming parameters -->
<xsl:param name="inNode" as="node()" />
<xsl:sequence
select="if (not(matches($inNode//text()[last()], '.*[\.\?!]{1}$')))
then '.'
else ''" />
</xsl:function>
According to the eXist docs, following:: and preceding:: will work, but not with wildcards; so following::text() is worth a shot...
Today:
- Added rendering for the document title and authors to the output.
- Re-thought the db structure for default strings and styles. It turns out we'll need GUI strings specific to individual style guides (e.g. "Retrieved [date], from [URL]" for APA), so I've subdivided the default subcollection into strings and styles subcollections, added an
apa_strings.xslfile, and updated thegetGuiStrings.xqcode so that it collects strings from the wholedefault/stringssubcollection. Similarly, the code for retrieving styles had to be slightly updated to take account of the db structure change. - Greatly expanded the db style information. We're now getting down to the nitty-gritty of rendering styles as the XHTML code moves forward, and I'm making a lot of decisions about what goes in the base styles and what goes into the style-guide style document.
- Added rendering for appendices.
- Began serious work on the bibliography rendering. So far I've done books and journal articles; a lot of stuff that needs to be done for all document types is now complete, including retrieval information for electronic references, name rendering, and title handling.
Tomorrow I'll try to finish first pass through the biblio code. One major issue still remains: finding out if the current set of authors (or editors, or whatever) is the same as the previous set, so that a dash should be used. That may take some thought.
Some of the code written today makes good use of the new features in XSLT/XPath 2.0 (for instance, I use a regular expression match to determine whether a title ends with punctuation or not, so I can add a period in the reference only when it's needed). The power of this has got me thinking about the possibility that the old thorny issue of commas inside quote marks could be handled this way. Imagine that, when rendering an article title in the text, the code looks ahead to the next text() element to see if it starts with punctuation. If it does, it grabs that punctuation and includes it in before the closing quote; similarly, a text() matching template could check for the preceding sibling to see if it's an element that would be rendered with quotes, and if so, any leading punctuation is removed. That would be cool. It would require a list of all elements that are rendered with quotes, to check against. The only thing not quite clear to me yet is how to reliably find the text() element immediately following the quoted element. There is a following:: axis, so following::text() should do it, but IIRC eXist doesn't yet support this axis.
Cracked a major set of problems for teiJournal today.
Display styles are stored in three different places in the database: base_styles.xsl,
[styleguide]_styles.xsl, and the user's styles.xsl (containing customized styles). Each file contains a block of <xsl:attribute-set> elements, each of which represents a ruleset, and contains a set of <xsl:attribute> elements, each of which represents a property and a value.
These blocks then have to be combined in an intelligent way. The basic hierarchy is that base styles are overridden by any applicable styles in the style guide, and those styles are overridden by any in the user styles file. So any user styles replace styles from the other two files, style guide styles replace base styles, and base styles are output where there are no overrides in the other two files. Furthermore, where there are rulesets or rules in either of the two lower files which are not represented in their ancestors, these need to be output as well.
The big step forward today was the creation of an XQuery file capable of doing this cascading combination. The resulting code is pretty small, and worth documenting in full. There are two functions, f:getCombinedDoc(), which retrieves all the rulesets from the three documents in the database and joins them together into one file, and then f:getAttributeSets(), which combines rulesets together, ignoring any overridden rules, to produce a single source in the form of an <xsl:stylesheet> document. The functions look like this:
declare function f:getCombinedDoc() as element(){
let $guideId := request:get-parameter('guide', 'apa'),
$base := doc('/db/teiJournal/settings/default/base_styles.xsl'),
$guidePath := concat('/db/teiJournal/settings/default/', $guideId, '_styles.xsl'),
$guide := doc($guidePath),
$user := doc('/db/teiJournal/settings/user/styles.xsl')
return
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
{$base//xsl:attribute-set}
{$guide//xsl:attribute-set}
{$user//xsl:attribute-set}
</xsl:stylesheet>
};
declare function f:getAttributeSets() as element()*{
let $doc := f:getCombinedDoc()
for $setName in distinct-values($doc//xsl:attribute-set/@name)
return
<xsl:attribute-set name="{$setName}">
{
for $attName in distinct-values($doc/xsl:attribute-set[@name=$setName]/xsl:attribute/@name)
return
$doc//xsl:attribute[@name=$attName][./parent::xsl:attribute-set[@name=$setName]][position() = last()]
}
</xsl:attribute-set>
};
The resulting document is then passed on to an XSLT transformation which turns it into a real CSS stylesheet. The pipeline looks like this:
<map:match pattern="*/style.css">
<map:generate src="xq/getStyleSheet.xq" type="xquery">
<map:parameter name="guide" value="{1}" />
</map:generate>
<map:transform type="saxon" src="xsl/attribute_sets_to_css.xsl"/>
<map:serialize type="text" mime-type="text/css" />
</map:match>
Note the mime-type attribute on the serializer: without this, the file is served as text/plain, and the browser fails to interpret it as CSS, so it doesn't apply it to the Web page (this took half an hour to figure out).
So styles are now being applied to the XHTML output, and I can begin refining those styles by building the attribute-set files. Meanwhile, the last bits and pieces of the XHTML output itself need to be completed (appendix and biblio handling).
All the XML tags we're currently using in the body text are now handled, including all the <hi> variants, <mentioned>, <term>, <soCalled>, etc. Images are also handled -- they show up with a class which will constrain their size, but they are also links to the full image. There's a slight oddity with the way FF shows images when they're not part of a document, though; I've posted a query on the Cocoon list about that.
Notes are handled as in the Mariage site, except that they're simpler (no refs to annotations to handle). The JS that pops them up will follow the model of that on EMLS.
Actually, I'm wondering if we could manage to handle note links (from markers to actual notes) in a couple of ways, depending on whether JavaScript is turned on. We could default to a straight link (<a> tag) which would bounce you down the page to the note in the note list; then, on page load, if JS is turned on we could have a function which iterates through all the note links, and hides the <a> tag, inserting another tag right after it which pops up the note in the right margin, as on EMLS. This would be elegant and flexible.
This is handy when you're checking whether your XSLT handles all the elements your documents happen to use:
distinct-values(//*/descendant-or-self::*/name(.))Started working through the basic structure of the XHTML output. The headings (APA stuff) were a bit tricky, but I've figured it out; headings are always h2 down to h6 in XHTML tag terms, but they also get a class attribute which is based on the level they're at and the number of levels, so we can style them appropriately. Lists, tables and quotes are handled, as are names and abbreviations. There are still titles, figures, graphics, notes, mentioned/soCalled/term etc., and the dreaded bibliography to do. There's also the wrinkle that appendices may have nested headings, and those headings are styled based on THEIR nesting level, not the levels in the main text. However, I think it's reasonable to assume no more than three levels in appendix headers, so we can avoid a lot of calculation that way.
We have a slightly interesting dilemma which is the result of some oddities in the APA style. Articles may have multiple levels of header in them, if they're divided into sections (both the articles I've worked on so far have two levels of header). APA, rather strangely, chooses to style headers based on the number of levels that happen to be present in the article; so, for example, where there are two or three levels, the second level is aligned left, but where there are four levels, the second level is centred, and the third level is left-aligned instead. For full details, see the APA Style Guide section 3.32.
Quite frankly, I think this is astoundingly silly, and so does everyone else I've shown it to. it means that the second level heading in one article may well be styled differently from the second level heading in another article. When we get to five levels, it gets even sillier; an ugly all-caps header is inserted at the top level, pushing all the other levels down, and making that particular article look radically different from others which use fewer levels of heading.
I've never really worked seriously with APA before, so this is new to me. Chicago and MLA seem to have nothing to say about it, other than Chicago's pragmatic assertion that levels of heading are "differentiated by type style and placement" (1.74). It makes for an interesting problem for XSLT and CSS, to say the least!