Much progress with PDF generation, and one problem
I'm making some progress with PDF generation; I have divs, headings paragraphs, external links, images and a variety of other features working. However, I've hit one problem:
XSL:FO has properties called keep-with-previous and keep-with-next, which are used to prevent the orphaning of titles at the bottom of pages, and similar layout oddities. The FO-to-PDF converter in Cocoon 2.1 is Apache FOP 0.20.5, which is old, and which doesn't support the keep-with properties. That causes an occasional output problem: a title can end up at the bottom of a page, separated from its following paragraph. For our serious print publishing, we use the RenderX XEP engine, which has virtually complete XSL:FO support. However, XEP costs a lot of money ($4,000 for a single-core one-CPU), so it can't be a default part of the teiJournal project, which is wholly open-source.
Meanwhile, both FOP and Cocoon are moving forward; Cocoon 2.2 is out, and includes a more modern version of FOP which supports the properties we need. However, Cocoon 2.2 is a completely different animal from 2.1, with a totally different structure; moreover, the XML database we use (eXist) is available in a package with Cocoon 2.1, but no such package exists for Cocoon 2.2. So the situation is this:
Right now we can't ensure that PDFs avoid the orphaning problem in teiJournal (although if IALLT Journal wished to, they could pay $4,000 for XEP and solve the problem). In the future -- over about two to three years, I estimate -- eXist will probably move to Cocoon 2.2, or we'll learn to build Cocoon 2.2 with eXist, and teiJournal will be ported to Cocoon 2.2 and solve the problem. So we're looking at occasional orphaning problems occurring with some articles for two or three years.
While I'm posting, a reminder to myself about PDF output development and the caching problems we have with it. First, remember that the browser usually caches a PDF download, so you need to clear the browser cache before grabbing an updated copy when working on PDF output. Secondly -- and this is a killer, that I'd forgotten about -- when there are multiple XSLT stylesheets being called, Cocoon will cache the results of a transformation unless the root stylesheet has changed. Therefore, if you're actually coding in a different stylesheet, you need to make a quick edit to the root stylesheet and upload it in order to trigger a refresh in the Cocoon pipeline. It doesn't know that the root stylesheet invokes other files, so it doesn't check to see if they've changed.