Discussion of print dictionaries, collation for orthography, vacations and timing, and other issues.
Opening old NetBeans 6 projects in NetBeans 7 resulted in errors because the JUnit libraries couldn't be found. I had to right-click on Test Libraries, choose Add JAR/Folder, then choose /usr/share/java/junit4.jar to resolve the missing dependency. Took a while to figure that out, curses...
We sort our Moses entries currently based on the phonemic representation, using a Java comparator I wrote specifically for the project. Now we're going to have orthographical representations, the sort order will have to be amended to take account of that. I'm therefore reviving the NetBeans project for the MosesCollation, and beginning to update it.
This is the current sort order.
We'll need to add the following characters to the list:
- č (u010d) (does it sort before or after c?)
- š (u0161) (does it sort before or after s?)
- x̌ (x + u0323) (does it sort before or after x?)
I'm now trying to get the Gentium Plus font working with FOP, to handle our non-ascii characters. It is possible to use the "simple method", which involves giving the FOP processor a special config file telling it to parse the system fonts to find the font it needs. However, because I want to be able to make PDF generation a part of the portable webapp, I need to do it in the "hard" way as well, so I've started figuring it out.
The process is confusing, because FOP behaviour has changed. It seems to involve three steps:
- Generate a font metrics file for each font you want to use. This was quite hard, because I was trying to follow these instructions, which are out of date, as I reported to the Oxygen forum here; this is what I eventually had to do, at the command line:
java -cp lib/fop.jar:lib/avalon--framework-4.2.0.jar:lib/xercesImpl.jar:lib/commons-logging-1.1.1.jar:lib/commons-io-1.3.1.jar:lib/xmlgraphics-commons-1.5.jar:lib/xml-apis.jar org.apache.fop.fonts.apps.TTFReader /usr/share/fonts/truetype/gentium-plus/GentiumPlus-R.ttf GentiumPlus-R.xml java -cp lib/fop.jar:lib/avalon--framework-4.2.0.jar:lib/xercesImpl.jar:lib/commons-logging-1.1.1.jar:lib/commons-io-1.3.1.jar:lib/xmlgraphics-commons-1.5.jar:lib/xml-apis.jar org.apache.fop.fonts.apps.TTFReader /usr/share/fonts/truetype/gentium-plus/GentiumPlus-I.ttf GentiumPlus-I.xml
- Creating a FOP configuration file which tells it where the fonts are.
- Calling FOP and pointing it at that configuration file.
Still working on the second and third steps...
Wrote, tested and delivered a utility XSLT file for handling problems in the glottal.xml file, per SMK's request.
I've started work on the XSL:FO/PDF generation code, working with some test files for which I've auto-generated the orthographies. I have a basic layout done, for letter-sized paper, and a parameter system built in which enables me to add other paper sizes later. I'm working with FOP, because if we can get what we want with it (and it's looking good so far -- columns work) then we can deploy anywhere. The biggest hurdle right now (after reminding myself of how page-masters and sequences work) is getting the Unicode characters to display correctly. That's next on my list.
More refinements to the orthography generation. We've now decided that we should base the orthography on the hyph rather than the pron, because that way we can perhaps insert intrusive schwas more easily; morpheme boundaries appear to be significant for this purpose. I'm also orthing the phonemic phrases in citations, and since these are partly hyphenated, I'm handling them slightly differently, splitting on the morpheme boundaries, but using the same conversion code. I have a hook in for the schwa insertion if we can formalize the rules for it.
Wrote the orthography-generating XSLT, and tested and refined it with SMK working on the l-affric file. The only remaining outstanding questions are: what to do with dotted n (change to nn, or keep as-is, or remove the dot); and how/whether to insert the extra schwas we see in older examples of the orthography.
Also tested some XQuery to determine how practical it will be to link some morphemes in hyphs automatically to their source morpheme. There are many instances where a particular string has only one existing morpheme link, so there are lots of candidates. My XQuery could be used to build a lookup table for all instances of a string which has a single existing corresp, and we could use that to auto-link a lot of m elements.
Continued refining the names page, in response to feedback, including adding print settings in the stylesheet to hide the menu stuff. Also added counts of entries into the status page, so that we can see where we're at.
On the plane yesterday I worked on the rewrite of the Names page, which is now working. It's a sortable table, showing all things tagged as names, along with links to their entries. In the process I brought in some JS for the table sorting, and modularized some of the entry link display code. I also found some bits of code which look obsolete, that I might be able to get rid of; I should do that asap. Uploaded the new code this morning, then worked on a bug with the menu display caused by my changing some of the GET params.