Conversion of LaserGreekII text: great progress
Posted by mholmes on 17 Mar 2009 in Activity log
Made huge progress on converting the LaserGreekII text for PC in GRS. This is what I did today:
- Took the spreadsheet showing the character substitution tables Greg found, and exported it to XHTML. Cleaned up the XHTML a bit manually.
- Wrote XSLT to generate text output from the XHTML consisting of a series of single-character substitutions (suitable for XPath
translate
) and string-pairs (where multiple characters are involved), and made the XSLT format the latter as a stream of embedded XPathreplace
function calls, with the longest search strings at the innermost point. - Built a conversion XSLT stylesheet with a match element which calls the embedded
replace
s, bracketed by a finaltranslate
on the outside. - Spent hours manually tuning this massive block of code to do all the requisite character escaping; this involves backslash-escaping all instances of \, [, ], {, } and |, and extracting anything which involves apostrophes or quotes into
<xsl:variable>
s. Eventually it validated. - Exploded the ODT file containing the document, and extracted the
content.xml
file from it. This contains all the actual document contents. - Analysed the contents of the document to figure out a good match attribute value to catch all the instances of text using LaserGreekII. The result was this:
<xsl:template match="text:*[//style:style[child::style:text-properties/@style:font-name='SymbolGreekII']/@style:name = ./@text:style-name]/text()">
- Ran the conversion on the content.xml file.
- Copied the ODT file, and opened it with the archive manager, then replaced the original contents.xml file inside it with my transformed version.
- Opened it in OOO.org, and was amazed to see that it seems to have worked first time. We'll have to do some serious checking, and I'm sure there will be some tweaks that need doing, but it's pretty much there. It still specifies LaserGreekII as the font for all the Greek bits, but since that font isn't on my system, it happily substitutes something else. We'll have to pick a decent font to use, in consultation with PC, and then I'll build that into the XSLT, and away we go.