Transformer is a great tool for working on Unicode texts, but today I hit a problem, in that I needed to work on Hot Potatoes data files, which are not actually Unicode; they're 8859-1 files with all upper-ascii characters escaped to numeric escapes.
It seemed a shame not to be able to work on those files, since the actual underlying format is Unicode (they use Unicode characters, but encode them as numeric entities). So I made a couple of changes to Transformer to make that possible. First, I analysed the file load routines: there are two, one for the source text in the main screen, and one used to load files during batch operations. These were both failing to load plain ascii files, because they appear to be UTF-8 with no BOM, but they're not. So I added a couple of lines so that, in the event of a failure to load a file as UTF-8, it will be loaded as plain ASCII and then turned into a WideString.
Secondly, I needed a way to save a file as ASCII or ANSI, but deal with any characters over 127. I added an option to the batch window to save as "ASCII with numeric entities", which escapes all characters above 127 to HTML-style numeric entity references, and then saves as ASCII.
This all seems to be working well, but it needs to be documented, and the same save function should also be added to the output text save dialog and routine, for symmetry. Adding this as a task so that I get around to doing that.
According to Paolo Cutini, the "Files changed:" message that shows up after adding/removing a UTF8 BOM cannot be translated.
The task below completed -- turned out I needed to make an explicit call in the toolbar resize event to reposition the sequence TTntListView control. Fixed the bug, built a new installer, and released version 220.127.116.11. Also fixed links on the Transformer and Image Markup Tool sites to the project blogs (which have now changed location).
Testing on various platforms, especially on VISTA yesterday, where we set up a non-standard font/DPI setting, reveals a minor bug in the toolbar sizing in Transformer. it seems to afflict the top left toolbar (sequences) in the main screen. When icons are set to larger that 24px, the toolbar does not auto-size. This might just be the AutoSize setting being false (check all toolbars in the app), but it could also relate to the resize code handling the display of the grid component below it.
Completed 7 Feb: As was done with IMT, we need to integrate the dll-based version of the Nuvola icon set into the application, rather than hard-including the icon resources, so that the dll can be built separately and replaced by anyone who wants to work on it.
Created a workaround for this Feb 7.
Had some correspondence over the weekend with Dieter Köhler (author of the OpenXML code I'm using to save and load files), and he confirmed that the XML 1.0 specification disallows characters in this range. The question now is how to handle situations in which people insert these characters. The XDom code is asymmetrical in that it fails to raise any error when saving a node containing illegal characters, but it does raise an error when trying to read them back in; I need to allow for this by somehow escaping these characters myself.
Posted time spent researching this and checking into my code. Also made it a task to add the relevant code to the app.
Complete: created a reasonable workaround for this on Feb 7.
Paolo Cutini is using Transformer to recover some old WordStar word-processor files, and encountered a problem reloading the sequence file he had saved. The file appears to be corrupted; a control character occurs throughout the document. oXygen reports:
F An invalid XML character (Unicode: 0x1c) was found in the element content of the document.
That character is "INFORMATION SEPARATOR 4" or "file separator". It wouldn't normally be found in a Unicode document. However, that character is in the Unicode specification, so it ought to be somehow encoded in a format that UTF-8 can handle. This may be a limitation of the XDom engine I'm using for XML file handling, it could be a bug in my code, or it could be that I should automatically exclude control characters on the basis that they shouldn't show up in a text document. I'll look into it. Transformer is intended for working on Unicode texts, rather than ancient word-processor formats, but I do like the idea of using it to retrieve this old data; the program was written as part of a project to rescue some old DOS WordPerfect data, after all.
Entering this as a task with a long deadline, because it's not a major thing; what needs to be done is to investigate the code which uses XDom to save files, and see if the file data is being correctly encoded in UTF-8; if so, look into the specs and see if UTF-8 is supposed to handle this character, and if so, whether it should be somehow encoded or escaped.
Did a new release of the program (18.104.22.168) incorporating:
Following the development of a PDF documentation system built on the existing DocBook Help system, another release should be made, also incorporating fixes for any bugs which emerge before the end of the year.
Help for Transformer is built using a system we are developing which incorporates the Image Markup Tool (for interactive screenshots) and DocBook files. Currently this generates only an interactive HTML Help system, but it will eventually also generate printable PDF documentation. Transformer is the testbed project for this documentation system, which will be used to document all our projects in the future.
Transformer is an open-source Windows application written in Delphi 2005 by Martin Holmes. Transformer loads Unicode text files and performs sequences of search-and-replace operations on them. It provides you with an interface to create and test these sequences of search-replace operations before running them in batch mode on a set of files.
|<< <||> >>|