The author of the third article wanted to include much more detailed information in the affiliation tag, so I've added the new info, and changed the XSLT so that it simply inserts a linebreak after the name and then processes the rest of the contents as they are. <lb> tags have to be used to divide lines, because we can't have <p> tags inside <affiliation>.
Today I got the sorting stuff written and tested. It's basically a bunch of nested if-then-elses, with some fancy-footwork to handle sorting on two items at once (such as volume+issue). While this sorting does include sorting by title, with XSLT we can call our own little java library containing a sort comparator, which enables us to handle e.g. sorting titles while ignoring leading articles. We'll probably sort the returns in XQuery anyway, but pass the sort key on to the XSLT so that it can decide if it needs to sort them again with a more sophisticated comparator.
I also started work on the text search, which is a more complicated business. First, we need to split up the search string into a sequence of items. Items can be:
- quoted strings (for exact matches), or
- individual words.
Each item can be preceded by
- + (must match),
- - (must not match), or
- nothing (optional match).
Each non-quoted string may contain wildcards:
- * (any char(s))
- ? (any single char)
For each item, we must generate an XQuery expression, and then we have to string them all together in an appropriate way to create a single clause.
I've done this job in a less efficient and more constrained way for the Mariage project, but this is an opportunity to create a really good mapping between Google-style search syntax and eXist search functionality. There are significant problems, of course, especially with match-highlighting, but I think we can do a reasonable job.
Got the framework of the XQuery done, with search filters such as volume, issue, author, keyword etc. working. Still have to get the search system working, with useful match returns -- should the XQuery do the KWIC construction, or the XSLT? -- and I also need to get ordering working.
The plan to merge TOC and Search is a good one, though. We could actually put the search form on every page, if we want. Right now, the TOC/search is handling only texts in the /texts/ subcollection, but if the rest of the site content is to be stored in the db too -- which I think it should -- then we ought to allow an option to sort that material as well. That should perhaps be a switch, but I'm not sure yet how best to implement it.
Got the third article completed and posted. In the process, I made a couple of tweaks to the CSS (line-height and font sizes) to make the page look a bit easier on the eye.
Did some work on the Bibliographical Markup page of the encoding documentation. It's still in need of worked examples of <biblStruct> elements, but I'll add those soon.
Managed to make some progress this morning, despite network issues. This is my report to the group:
I've just had a chance to do the biblio on the latest article. I looked through the APA guide, including the electronic resources PDF, and there's no specific mention of video or DVD at all, as far as I can see; the key category seems to be "Motion picture". Accordingly, I've decided to treat the item as a motion picture, and relegate the information about the format to a note, like this:
<biblStruct xml:id="dörnyei_2005" rend="video">
<monogr>
<respStmt>
<resp>Speaker</resp>
<name>
<forename>Z.</forename>
<surname>Dörnyei</surname>
</name>
</respStmt>
<title level="m">A closer look at Motivation in the language learning classroom</title>
<imprint>
<pubPlace>Stirling</pubPlace>
<publisher>Scottish Centre for Information on Language Teaching and Research (Professional Services), University of Stirling</publisher>
<date when="2005"></date>
</imprint>
</monogr>
<note>DVD and online video.</note>
<note><ref target="http://www.scilt.stir.ac.uk/dvd/index.html"><date notAfter="2007-09-18">September 18, 2007</date></ref></note>
</biblStruct>
This has the advantage of neutrality over which format (DVD or online video) is primary. I've written rendering code for it, and you can see the output on the page. There's no content in the doc at the moment, just the framework and the bibliography. So far we're handling the following reference types:
- book
- book chapter
- journal article
- presentation
- video
and we'll be adding more as we go along. Meanwhile, I'm refining the basic code every time, to handle different contingencies that crop up (such as the need to make Dörnyei a "speaker" on the video, rather than an author or an editor).
I've also begun work on the table-of-contents code, which is still in its early stages; it needs to be fairly complex, because I'm envisaging it as combined with the search system, like this. The idea is that the complete TOC of articles would be available on one page, but there would be a "Search/Filter" button at the top; if you click that, a form will appear, where you can choose to filter the article list by a range of different criteria, including volume, start date, end date, author, type of article, keywords, and even search text; and you'll be able to sort the results in various ways. Clicking an "Apply" button would retrieve the new results and show them as a TOC. If you've searched for text, then you'll also see gobbets of text with the search term highlighted, in the TOC list. Any of these searches or TOCs would be accessible through a URL (all the parameters are in GET variables), so anyone could send out the URL of a list of articles as a "collection". I'm hoping to have a basic TOC available within a week or so, but the searching/filtering bit might take a little longer.
I'm working on a script called contents.xq which will retrieve a list of documents from the db based on a very flexible series of parameters, and return them as a well-formed TEI document containing a <listBibl> of <biblStruct>s. The basics are in place, but I'm stuck on one really annoying detail: I'd like to include the request URL from which the data was generated, and I can't find a way to get that information, or pass it into the XQuery. I'm sure it's possible through the sitemap, but I can't figure it out yet. If it proves too difficult, I can probably pass it into the transformation instead.
Wrote a bit more of the XSLT-to-CSS code.
Converting between CSS and XSL attribute sets in XSLT is necessary for storing styles in the db, and a bit fiddly to do manually, so I've been working on a Java app to handle it. CSS to attribute-sets is working, and I'm halfway through the reverse. This code may be repurposed in a Java library that could be called from a Cocoon pipeline, as part of the editor's GUI.
Made some final fixes to XSLT and CSS based on the W3C validator, and reported to the editors as follows:
I've finished a first pass through generating the XHTML output from the two documents we've marked up so far. The results are here and here.
Some points to note:
- Both XHTML and CSS validate, according to the W3C validator.
- My choices as to fonts, spacing, sizing etc. are just arbitrary; I'll need your input into that.
- Wherever APA has anything to say about how something should be presented, I've tried to follow it (for instance, in the display of tables with no vertical and few horizontal lines), but I may have missed some APA diktats, so do let me know if you see anything odd.
- Notes and references work as popups on the right of the text (hence the larger margin on the right). However, if the user has JavaScript turned off, they should just work as straightforward internal links.
- Images embedded in the text are links to full-size versions of themselves.
- Metadata is embedded in the source of the document as Dublin Core meta tags. I plan to add some JavaScript which can parse it out from the header and present it to the user in a more human-readable form, but the meta tags are important for machine-reading, and for folks who turn off their JavaScript.
I think it would be useful at this stage to concentrate on making sure all the display features are working as they should, and following APA, and on making some basic choices about font style and display characteristics. Once we've got an XHTML appearance we're happy with, I can start porting that over into the XSL:FO/PDF output, which is a bit more tricky.
Other things on my mind:
I'm wondering if it would be useful to have a view of the text in which the JavaScript and CSS is embedded directly into the document, so that it would function as a single portable file. This portability would be undermined by the fact that images would still have to be externally linked, though, so perhaps it's pointless.
Tables have a minimum width which is determined by the minimum wrappable size of their content cells, so they sometimes stick out beyond the text column -- see, for instance, Table 3 in Yaden, with your browser window sized a bit smaller than usual.
This is probably unavoidable, but if you'd like to put some thought into ways to avoid it, I'd be happy to have suggestions.