Some time ago Stew asked me to do some work on a jwplayer instance for the Thomson mystery. I did the work then, but we recently needed to make some basic changes and hack the surrounding page to make things work in the site.
Category: "Activity log"
Using the new v3 for work on the Arabic writing site, I found a bug whereby the trailing backslash was missing from the backup folder, causing files to be backed up one folder up in the tree. Fixed the bug and tested successfully.
JJ reported a problem with the site today. The eXist service wasn't running so I started it, but the experimental map wasn't running. Toolbar and so forth appear but no map.
I've tailed the access and error logs and find no errors.
Using 3.0 to help mark up a ScanCan doc, I noticed that changing the Checked Only checkbox didn't set the Modified flag on the current file; fixed that, and then fixed a bug resulting from that fix. Also added a new "secret" Action invoked by F9 which takes content directly from the clipboard, transforms it, and places it back on the clipboard, also showing it in the GUI (there was a way to do this without the GUI before, but when you don't see anything happen, it's a bit difficult to know if you hit the right key or not).
Trying to use version 3 of Transformer for marking stuff up in ScanCan, I hit a bug whereby the first time the app starts (when the layout XML files are not there, so it's starting from zero), the resize code which ensures that columns in the main TListView are not resized out of existence or out of view does not work properly, because the initial values for column widths are not readable, or not read. Added a get-out clause in that resize routine so it detects that situation and does nothing; that allows the app to start up with columns not too badly set up, and thereafter you can resize normally, and it will save and read back correct column size values. That makes Transformer 3 usable on Wine.
Working with Transformer in my workshop today, I discovered that the JavaScript engine, which does some automatic code-checking and gives you an error message when your JS code is wrong, was giving me an error message in garbled Chinese. This was doubly confusing because I was actually working on Chinese texts at the time. It turns out that the JavaScript Bridge code I'm using, which was written well before Delphi 2009 was on the horizon, has lots of PChars in its code. In D2009, a PChar is a PWideChar, but the text is actually not a WideString, it's just a plain Ansi string. Rather than hack the JS Bridge code, which is a bit fragile and unmaintained, I just cast the "PChar" return from the JS engine to a PChar and then to a string: string(PAnsiChar(message))
. This seems to do the job. The error messages will always be in English anyway.
I also discovered today that my "is it UTF-8?" detection code gives a false positive on GB18030 files (PRC Chinese encoding). This is annoying, but there's not much I can do about it. It just means that when opening such files in Transformer, it always suggests UTF-8 instead of GB18030.
Since I now have robust text file encoding conversion available, I think it would make sense to make this a batchable feature. I'll need to think about how best to do that, but it could be a part of the main batch screen, where you could specify the input encoding. There should be a new, initial tab entitled "Loading files", which should have a drop-down where you can specify the input encoding. You could discover this using the normal file-loading capabilities in the Source tab.
Fixed a bug I found in Transformer when I was using it to prep some texts in the workshop yesterday.
Wrote the back-end code to check input, run XSLT transformations and save the results; added appropriate error messages to the error-logging code for when failures occur, and did some basic testing. Everything seems to be working. This basically means the application is code-complete. Next:
- A serious comprehensive test with a big batch of big files, including Unicode. Encoding is predicted to be potentially problematic; we need files in more than one original encoding.
- Updating of the Help system.
- Building a new installer for a beta release.
- Adding the beta to the website.
- Emailing users who might test it.
- Final release.
Completed all the dialog box functionality for the new XSLT transformation item type. Created a couple of new icons and added them to nuvola.dll, one for an XML document and one for adding an XML document; the latter is used for "Add new XSLT Transformation Item" in Transformer 3. Spent a little time making the file paths for external XSLT files robust; I'm storing both a relative and absolute path for referenced external XSLT files, so that if the sequence file is moved, and the XSLT file is moved relative to it, the absolute path can be reconstructed on load. This is working well.
Now the only thing left to do is the integration of the actual XSLT transformation, using the xsltproc dll from libxml2. This code can be adapted from the Image Markup Tool, and the adapted code will then be re-used in the IMT version 2 when I write it.
I've added the new item type to handle XSLT transformations, and written all the i/o code for it. I've started work on the dialog box for entering and editing that data, and it shouldn't take much longer to finish it. I've also done a bit of testing with the existing code, and confirmed that both PERL RegEx replacements and JavaScript transformations work well, with Unicode characters embedded in them (I used Japanese for testing).
The testbed for this app will probably be the Perseus dataset, which is in an old version of TEI (P4 or P3, not sure which). We'll want to massage it into P5, and at the same time add some extra markup and do some fixes. The data will then be used for the GRS myth-mapping project.
Converted over the dpr and the main application form. Also brought in a stripped-down version of the old SystemFunctions.pas, and did a huge amount of rewriting of the file i/o functions in the application. I'm also using the SelectFileEncoding functions and dialog box I wrote for the Apparatus application when loading files. Transformer 3 now builds and runs, but of course there's a lot of testing to do, and many issues where character encoding will need to be carefully checked. I also have a problem with the main menu having disappeared; it's there, but it seems to be covered up by other components for some reason. Still working on that.
Next link in the chain...
Began focusing on the more Transformer-specific libraries for porting to Delphi 2009, specifically the ReplacePair
form/unit, and then the TransformItems
library, which defines the core classes of transformation items (currently only TReplaceItem
and TScriptItem
, but soon to include TXSLTItem
).
The ReplacePair
form was pretty straightforward, but there was one problem with TPerlRegEx
, which I'm using to replace TURESearch
. Compilation would fail with a fatal error, for no discernible reason. I eventually tracked down this bug, and was able to use the workaround (adding a pointless call to the Match
method).
Moving into TransformItems.pas
I started to hit the inevitable problems that come with using the open-source JavaScript Bridge library, which has lain untended since early 2004. I had to make one tweak to js15decl.pas
, to overcome an ambiguity with StrDispose
, which now has two variants, one for PAnsiChar
and one for PWideChar
. I chose the latter version, so it now compiles, but I have no idea whether it will work or not. However, I did also find this project: (http://code.google.com/p/rawfpcjs/) (blog is blocking Google URLs in links at the moment), which is very recent, and looks promising; if there are problems with the use of JSBridge in D2009, I can probably move to this alternative, and it might even make life simpler. There's still a lot of work to do on TransformItems
, and then on the individual ScriptItem
units, but I'm getting close to working on the actual application, finally.
Added the code to map the old output filename variables onto the new template system. The Batch window functionality is now complete.
Of the three items listed in the previous posting, I've fixed the menu positioning issue, and I've cleared out the old unused message TStaticText controls. I've also made a start on a version conversion system for batch files, by adding a version element to the output file, and checking its presence/absence/value when loading a file. Now I just need to add the code for converting the complicated old settings (5 variables!) to the new single OutputFilenameTemplate
string value.
Finished the output filename mask code, with the live demo of the mask in action and the file i/o fully tested. A couple of things remain:
- That popup menu is still in the wrong place.
- I haven't decided how to handle old Transformer files; if it's even possible to map the old setup onto the new, I should try it, but I should also warn people when they open an old file that they should review the filename settings.
- I probably haven't cleaned out all the obsolete messages stored in labels in the Batch form.
This stuff shouldn't take long, and then I'll be in a position to take on Transformer itself, starting with the data structure for replace items.
As part of rewriting the Batch file processing screen, I've been looking closely at the clunky old system by which the results of files were saved to a designated location and name. I'm going for a placeholder-based system similar to that used in oXygen's transformation scenarios, but a bit more flexible (it has more human-readable placeholders, and has one for the original file extension). This also integrates with controls for choosing an actual folder as part of the filename. I have most of the work done, but there are some GUI issues to fix -- I need to add a plain folder image to the Nuvola dll, add images to the popup menu, and figure out a problem with the position of the display of the popup menu, which is currently in the wrong place. I've also refactored a lot of the original code, renaming components to remove the leading "u" which was used to designate a TTntUnicodeControls component.
A great addition to the application, so I've rebuilt the installer (same version, of course -- there's no change to the executable), and updated the documentation and the Web site. I've also updated the roadmap and future features information to show my current plans for the application.
I found this excellent implementation of the PCRE library wrapped for Delphi by Jan Goyvaerts. The Delphi code is MPL 1.1, and the PCRE engine is BSD, so it's all usable in any of our projects, and it's perfect for Transformer because it handles UTF8. I initially compiled and installed the component, which is designed to work with the dll that's shipped with it, but every time I destroyed a created instance of the component, I was getting access violations, so I edited the source to link to the C object files instead of linking to the dll. This uses a slightly newer version of PCRE, and more important, it doesn't generate the AVs. Built a test app, wrapping the component in my own class to suit what Transformer will want to do. Everything works fine!
So yet another mighty step forward: now I can provide good Unicode regular expression support, without having to map it to the JavaScript library.
Made fantastic progress today. This is basically what I've implemented:
- Several LoadFileToString functions with a range of different input parameters.
- Functions for detecting the character set on load, including peeking into XML and HTML headers, and detecting UTF-8 byte-sequences.
- An inventory (TDictionary) of code pages known to Windows, which make it possible to look up any code page id found in a header.
- A dialog box which will let you test any of the different code pages against your text until you find the right one, with live conversion and font control.
- Lots of testing with a variety of languages, encodings and BOMs.
With the exception of UTF32, I now have all of this stuff working. I'll have to add the UTF32 handling, and then work on finding a decent open-source implementation of regular expressions for Delphi. At some stage, it might be worth trying to take the broken port of Mozilla code, which has functions for recognizing likely ANSI encodings by their byte sequences, but that might be overkill.
This really has been hard, but quite rewarding, and infinitely valuable. I can add to Transformer the ability to specify an input code page as well as an output encoding.
Delphi 2009 has better file i/o for Unicode text files than any previous version, but there are still lots of holes. It's good at loading a file which has a BOM, but if there's no BOM, it just uses the system's default encoding. I need to do much better than that, so I'm writing a lot of new code, and repurposing old code, to make that happen. What I've got so far is a function which automatically detects any UTF8, 16 or 32 BOM, and failing that, checks the bytes of the file to see if it's likely UTF8. Now that's working, I need to go further and check for explicit character encodings named in the preamble of the file itself, in HTML, XHTML or XML files. This will involve assuming ANSI, which is reasonable, and loading it that way, then searching all the likely locations, allowing for case, etc. I've done something a little like this before, but it has to be a bit more bulletproof. Then I have to follow Marco Cantu's example of defining a custom encoding to create the second of the UTF32 encodings, so my apps can load files in UTF32.
This was a bit tricky, because of the use of AnsiString types and PChars in the original Pascal header conversions. I have it working, but I don't have much confidence it will work reliably with (say) filenames in Japanese. That will need some testing.
Ported a progress bar dialog box which is required for the batch file window in Transformer to Delphi 2009.
Ported and simplified my DocLauncher console app, which is used to launch help documentation for IMT and Transformer, to Delphi 2009. In the process, I removed dependencies on old libraries (FileFunctions
and SHBrowseU
) by substituting actual ShellExecute
calls for calls to my own wrapper functions.
Updated the TRecentFiles class to use ADOM for saving/loading, and also simplified the constructor (there were two of them, but one was actually enough).
Starting working out the graphics needs for e.g. IMT, and found that Delphi 2009 has built-in support for JPEG, PNG and GIF (add JPEG
, PNGImage
and GIFImg
to the uses
clause respectively). TIFF
and other formats are not supported, though, so I found a port of GraphicEx
to Delphi 2009; adding this to the uses
clause adds lots of formats including TIFF
(but it generates lots of warnings about "unsafe code"). I think adding GraphicEx
before the other graphics units in the uses
clause will ensure that Delphi code is used for those formats which are handled natively, while GraphicEx
code is used for other formats it can handle.
Began porting RecentFiles library. This unfortunately relies on my old-style XML i/o code instead of ADOM, so the disk i/o stuff will have to be rewritten, but it shouldn't take more than an hour or so.
Also, with Greg, began testing D2009 test apps on Darwine. It seems that the Kronenberg download of Darwine (which is at 1.0.1, the same as the Ubuntu version) works perfectly as long as you run your apps from the command line using the wine binary; if you try to use his WineHelper app, they blow up. What this means is that Wine works great on OSX (we even tested with Japanese GUI strings), so we have a viable platform, but we don't have a user-friendly Wine front-end for Mac users. That may change over time, but in any case we can provide a script-based installer that would put the app in the right place, and then create shortcuts that would run it properly. So all three platforms are a go.
More progress porting my code to D2009:
- Ported the Preferences unit and dialog box, in the process creating a Transformer replace sequence that does most of the work on PAS and DFM files.
- Built a test app for Preferences, and tested it.
- Built a universal test app for all my libraries; as each is ported, it'll be added to the universal app, and be tested automatically.
- Tested Unicode (with Japanese) in GUI translation system. Works great.
- Tested the same thing, along with Preferences, in Wine -- again, it works a treat!
I've also, finally, downloaded and built the Help file updates, so I have a working help file. Not that it's much use, in practice.
Ported the translation code, which again proved a little simpler than expected; the test application is working like a charm. The only unexpected thing was that although I thought I'd be looking for properties which were tkString (previously tkWideString), it turns out that I needed tkUString; presumably the TypInfo.pas unit is not quite as "Unicode-everywhere" as the rest of the VCL and RTL. My guess is that this is a result of its hooking into Windows fairly closely, so being dependent on Windows types rather than Delphi types, so tkString has some specific relationship to a Windows string type.
Simplified the FormState library so that it only uses XML files, and also added a new feature so that non-modal forms can be shown at start-up if they were showing at shutdown.
Next, the translation code...
Ported several libraries to D2009 today, including GenFunctions, SplashAbout, VersionInfo, Icons, and FileOverwriteConfirm, and I'm beginning work on FormState, which requires several of the others. Everything is going very smoothly indeed so far.
The next iteration of Transformer needs to get rid of its dependence on the buggy TURESearch component from jclUnicode.pas, which has serious issues. I've been searching for ages for a decent open-source alternative, and now I think I've found one. It requires Delphi 2009, but that's on order (I think), and I do need a relatively straightforward project with which to pilot a move to D2009; Transformer is probably that project. This would be the migration path:
- Get D2009 installed and working.
- Create a new projects tree.
- Migrate Transformer files into it, along with other key libraries from my current tree (such as Batch, Translate, etc.).
- Install XDOM 4.2 (or ADOM, if that's the current name of the appropriate version), and Project JEDI JCL and JVCL.
- For each sub-project (Batch, Preferences, Translate, FormState, SplashAbout, RecentFiles...), create a new test/dev app in which to develop it.
- For each sub-project, remove all TNT dependencies, and rationalize all dependent code so that Unicode strings are now used. Pay special attention to any SystemFunctions, FileFunctions and StringFunctions code which may be invoked. It may be necessary to start a new version of each of those files, into which we only add functions that we turn out to need.
- Once all the dependent projects are working, bring in Transformer and strip out TNT from that.
- Get the PERL RegExp wrapper package and install that in Delphi.
- Rewrite the string replace code based on it.
- Once everything is working, think about adding the XSLT support through libxml, as in IMT.
The port of the huge GRS database from Filemaker to LAMP requires some data massage, so Greg and I wrote a bit of JS in Transformer to do some of it. This threw up again the limitations of the Transformer regexp setup -- I really need to work on that, by moving the regexp option out of the replacement transform, and making instead a wizard for adding a regexp operation as a JavaScript transform. But there's no time...
You can now hold down the control key when pressing the Do Transformations button in Transformer, and it will take input off the clipboard, process it through the transformations, and put the result back on the clipboard. That's really handy for these ad-hoc usages that I find for it in markup projects.
I'm using Transformer to auto-markup some bits of text while working on ScanCan documents, and found it annoying that after every operation in the main screen there's a popup you need to dismiss. Added a command-line parameter, /suppressPopups
, which prevents this. May or may not document it. Transformer is deuced handy...
I scanned the poem at 300dpi for EdeR and gave him a PDF of the whole thing.
I also created a layered Photoshop file with the scans and adjusted it so that the layers were properly aligned vertically and horizontally. An overlayed grid can now be used to make quite precise measurements for placing fragmetns.
Paolo Cutini (as always) completed a new Italian interface translation within hours of the release of version 2.0. Great thanks go to him. I've rebuilt the installer to include the new translation, and posted it on the site.
Version 2.0 of Transformer, a Unicode text transformation tool developed for rescuing old data and transforming text files of all descriptions, has been released. This new version adds JavaScript capabilities to complement the original fast search/replace functionality, making it possible to do much more complex and sophisticated transformation operations on text.
Originally developed in 2006 as part of a project to rescue old DOS word-processor files from a Linguistics project, Transformer has since been used extensively on the Colonial Correspondence project.
Find out more at the Transformer site...
This morning I finished working on the tutorial, including the interactive screenshots, which will also be on the main Transformer Website. It took longer than I expected, because the originals were done with a pre-release (0.9) version of the Image Markup Tool, using a file format which can't easily be converted to any of the release versions, and in any case much has changed in the main interface. In the process of doing this, I also:
- Found and fixed a bug in the column-sorting code, which was supposed to be sorting by length of items for Find and Replace items, but was actually sorting (most of the time) alphabetically.
- Fixed a long-standing annoyance with column sizing in the TTntListView control that displays the sequence items. There's no onresize event for column headers, so they can end up being sized too wide for the control. Now they resize themselves appropriately when you exit the control, or click anywhere on it, which is better than nothing.
- Tweaked a couple of other resize functions, which were causing scrollbars to disappear in some extreme circumstances.
- Fixed a label which should have been updated before.
With luck, I can do a release of the app tomorrow, and then focus on reworking some of the code using threads, to enable monitoring and killing of JavaScript processes that run too long or get into endless loops.
I've now finished the section on scripting, so I just have a handful more topics to update with new screenshots etc.
JS-R is working on providing a new angle on Canadian census data by pre-calculating, and making available through a Web interface, two measures which express how segregated or integrated individual groups are within Canadian cities. He has a contract programmer working on a pilot of this, using PHP and mySQL, and wants us to take over and maintain/extend the project after the first phase is done.
Wrote to sysadmin to get a TAPOR project id and group set up, along with a domain (segregation.uvic.ca). JS-R will provide some basic intro material for the site, which we'll set up ahead of a presentation he's giving on June 3; the site will initially point at the pilot application on the external developer's site, for the purposes of that presentation, and then the code will be moved over to our server. At that point, we'll most likely move from mySQL to Postgres, to take advantage of better support for Views, since the queries are very calculation-heavy.
With more feedback from DB on what individual escape sequences should be converted into, I was able to add a lot more replace sequences to the set that I've developed for his Waterloo Script files. In the process, I noticed a need for a "Clone Transformation Item" feature, because many of the new replacement items were minor variations on old ones, with one character changed. I've now added that feature (including adding a new icon for it -- twin wands -- in the icon library dll). It seems to be working fine.
I've run the extended conversion sequence on DB's files, zipped up the results, and posted them again for him to download.
Ended up just tidying up what I'd done so far (centring images etc.); no time to get any further with it. Back next week.
The tutorial is quite densely-populated with screenshots, all of which have to be redone, so it's quite time-consuming to update it. So far, I've done the first five topics. I'll also have to add another one or two topics to cover the new scripting functionality. Should be done by tomorrow, I hope, in time to do a beta release before the weekend.
Batch operations in Transformer can be very long lists, and sometimes you just want to run a subset of the list, to test things out on a few files, or to re-convert a few files which failed previously, because of something that didn't affect the bulk of them. I've now added an option so that you can select one or more files in the list box of the batch screen, then right-click and choose to transform only those files.
The Help file is now complete for Transformer 2.0, and seems to be working fine. It might need a little more indexing, and perhaps some additional help for the scripting component, but the tutorial might take care of that.
With help from the oXygen forum staff, I reinstalled oXygen 9.2 in a new folder, and removed the old folder; problem solved. It must have been caused by a file not deleted during the pre-install uninstallation of the old version.
Spent most of the afternoon documenting new features in Transformer, with screenshots, and tweaking the app where I came across something less than ideal. At the end, I wanted to do a test build of the help file, but ran into an indecipherable error with Saxon 9B running in oXygen 9.2:
Saxon 9B null
When trying to validate the XSLT stylesheet, I got this equally unhelpful feedback:
SystemID: C:\Documents and Settings\mholmes\My Documents\Borland Studio Projects\my_projects\mdhHelp\mdh_docbook_to_html.xsl Description: net.sf.saxon.style.XSLVariable.getReferenceList()Ljava/util/List;
I don't know from this whether Saxon is broken, or whether it's been updated in oXygen 9.2 to a new version which stumbles over an error it was previously happy to let slide. But this stylesheet was working under 9.1, because I used it to build the IMT Help file, and that no longer builds either. Posted a message on the oXygen forums; we'll see if anyone else has seen it. Failing that, I'll try reinstalling oXygen, and perhaps also running the transformation on other machines.
Several new code libraries had been added to Transformer since the last release, and they were lacking header information, licensing info, descriptions for the Website, etc. The Website source code and requirements page, as well as the installer are now up to date for version 2, but the rest of the site will have to be updated before I can do a release. I need to do this (a beta, perhaps) within the next three weeks.
Working on DB's files, which are numerous and for which the processing is complicated, I found I really wanted to be able to cancel a batch operation, so I've repurposed the Progress form I wrote for Markin 4 to show how the batch is going, and enable the user to press a Cancel button and stop the process. This is a great improvement for large batches. The cancel flag is checked at the end of each replace or script operation, so as long as those operations are completing in good order, it's pretty snappy.
The problem of how to monitor and abort a frozen or looping script operation while it's happening still remains, though. I've been doing a lot more thinking and reading about this, and it seems likely that only something fairly aggressive such as TerminateProcess could do it. The script code would all have to run as a separate thread, and there would have to be a monitor thread with a timer, which was initiated when the process began, and which gave the user the option to kill a script process after a timeout had been reached without the thread terminating. That code could also check for the Cancelled flag, and kill a process when that was True but the process had been running for a (perhaps shorter) timeout. It'll take some work to get this implemented and tested. I'm still not sure there's a clean way to kill a process running in a C++ dll from Delphi without orphaning some resources.
There are now bridging Delphi / JS variables called JSInputFileName
and JSInputFilePath
, which are populated with the appropriate values during transformation operations.
In ColDesp, we're making a point of using the original .scx
filename as the @xml:id
attribute of the XML file created from it. This is easier if the original input filename and path are exposed to the transformation system, so I've added hooks which expose that information through two variables tied to placeholders. If you include ___inputFilePath___
or ___inputFileName___
in your replacement text, the path or name of the file currently being processed will be substituted. I'll add similar hooks into the JavaScript through bound JS variables. This will all need to be documented before the next release.
Since the beginning of the project, using the Preferences dialog box to control application font settings, there's been a bug in the repainting of TTntListView column headers, which afflicts the main window of the app. This might just be a bug in some graphics drivers, though; it doesn't show up on some other machines. What happens is that when a font is larger than the default, the original lower border of the column headers is left painted, obscuring part of the text. This works around it:
if LV.ShowColumnHeaders then begin LV.ShowColumnHeaders := False; LV.Invalidate; Application.ProcessMessages; LV.ShowColumnHeaders := True; LV.Invalidate; Application.ProcessMessages; end;
I've been trying to find a fix for this for a couple of years.
I've implemented a more sophisticated system for handling out of memory errors:
TTransformList
now has the ability to retrieve and store error messages when executing a script operation.- The main form routine
ProcessExternalFile
can now keep track of these as they go by, and report them to the calling procedure. - The Batch screen can now keep a list of all these errors, and show them to the user via a temp file saved to disk.
- I've confirmed that the operation was running out of memory about 68 files in, so I upped the minimum memory to
1000000
by default, but I've also implemented a system whereby you can pass a memory value to Transformer on the command line (-jsmem=1000000
), and if it's larger than the minimum, then theTTransformList
'sJSEngine
will be created with the larger value. This (coupled with proper documentation) will give users a way to get around out-of-memory errors if they occur. - I had to tweak one of the Italian translation strings, to add a new formatting variable for the number of script runs during a batch operation, since this is now reported at the end of the process. If the placeholder is missing, then you get an exception.
When processing the batch of schedule files, the JavaScript engine would stop doing any work half-way through, without any error message I could see or exception I could catch. The JSEngine object was created with an initial memory allocation of 100,000. Upping this to 1,000,000 "solves" the problem for this particular batch, but it's hardly a fix; presumably a larger and more complicated batch would at some point cause the same problem. Nothing in the docs helps pinpoint this, or suggests any way to catch this situation and report on it. I'm still working on this.
Working on the ColDesp project, I discovered that at a certain point in working through a batch job, the script stops executing (it does nothing). I'm pretty sure this must be a memory issue. Hopefully it's not a leak in the SpiderMonkey dll. More likely, I'm failing to free something, or I can do a better job of creating and freeing objects so the memory's retrieved when it's needed. This will need a bit of work.
I've also realized that there's only really one way to do monitoring/policing of operations: spawning a thread to do all the JS stuff, and leaving the main thread counting time, with a dialog box always open which enables you to kill the JS thread. That will also take some work.
But the app is now so useful and convenient that it's worth the time. I'm using it for all sorts of little tasks now.
Tried various ways to police the execution of JS code from Transformer, but no joy. No responses to my question either. This will take some work.
On the other hand, I made a lot of progress on the Waterloo Script conversion sequence; I'm now producing valid XHTML files from most of the script files I throw at it. I still have handlers to add, and there are issues such as what to do with index (.ix
) commands, which only make sense in the context of a larger document, but if I can get DB to help me reconstruct a complete file for each book, I can operate on the single file. There are also some accented characters I'm not handling, and I think it will be wise to run an XSLT transformation on the end result, to rationalize some of the block elements which handle (for instance) indenting. The transition from serial switch commands to hierarchical XHTML is slightly bumpy.
Started work on converting one of DB's script files from Waterloo Script to XHTML, using the new Transformer. Got quite a long way -- headings, paragraphs, and inline underlines are all handled, and I'm building up a Doc object which can process the input script effectively. XHTML is the best option because it's XML, but it can also be loaded directly into a word processor, which I think is what DB wants. Some points re the app itself:
- Added some counters to give back info at the end of the process on how many scripts were run, as well as the original total of replacements made. The counting is being done in both the single-doc GUI and the file processing for multiple docs, but in the latter case, nothing is returned to the calling function; that reporting will have to be added.
- Added detailed error reporting when code is checked in the Script editing form. Works great! Line, line num, and offending symbol are all reported correctly.
- Discovered that I need to handle such situations as endless loops, which just tie up the app. One option is to add a bound JS function to the beginning of each code block I run, which starts a timer that calls back to Delphi every second or so, so I can monitor and offer to kill the process if it runs too long. But I don't know how to kill it yet. I have a message into the DelphiMoz list about this.
- Minor GUI tweaks.
I started off the day by creating a dialog box for editing the local and global JavaScript text for transform items, and made the mistake of choosing a JEDI component (TWideHLEditor) to create the syntax-highlighting memo components. These caused access violations all over the place. In the end, I removed them, and installed the UniSynEdit package instead. That editor seems better and more flexible, and gives no errors.
Then I got file i/o working, and added JS Bridge code to the dialog box to check errors. I figured out how to get an error report back from JS Bridge (a new feature, not documented properly). That seems to work a treat.
Tested, bugfixed, and tested again. Then I added and tested handling for the old file format.
Now we're ready for a real test, using DB's Waterloo Script files. I also need to make sure the Unicode stuff really works in the UniSynEdit.
Today's progress:
- Finished the code for the transform item classes and the list class.
- Stripped all the old code out of the
main.pas
unit (where it should never have been anyway). - Substituted the new classes.
- Refactored to rename all items, objects, methods etc.
- Updated existing method code so that it still works as before (all replacements etc.).
- Added new icons (see previous post), and assigned one to the old
aAddReplacePair
action. - Added a new
aAddScriptItem
action; no method body yet, but it has menu and toolbar items which invoke it. - Built and tested: old code still works, new code remains to be added.
Remaining to do:
- Create a ScriptItem editing form, with a JS Bridge object which can be used to compile/verify JS code.
- Give it two tabbed panels, one for local and one for global JS.
- Give it Check, OK and Cancel buttons. buttons
- Implement the aAddScriptItem method.
- Complete the aEditTransformationItem action.
- Start testing! Emphasis on the Unicode...
- Update the tutorial.
- Update the Help.
- Update tehe installer.
- Update the Website codebase.
- Update the Website content and do the release.
Transformer needs four new icons, for Find/Replace, Add New Find/Replace Pair, JavaScript Document and Add New JavaScript Item, so I created those from a Nuvola original, and added them to the set in the dll.
Having established that we have a licence for Visual Studio, so we can distribute the MS dll, and that we can get JS Bridge working, I'm starting into the actual updating of Transformer, taking it to version 2; this will be targetted initially at producing a tool which can do the whole transformation of DB's Waterloo Script files.
Today I wrote a new unit (about 1,000 lines) with four classes: TTransformItem
(base class for the next two), TReplaceItem
, TScriptItem
, and TTransformList
. Much of the code for these is adapted from the old TReplaceItem/TReplaceList, but there are new properties and a slightly more complex hierarchy. Everything is done (although not tested) with the exception of TTransformList.ExecuteScript
, which will execute script directly. It's a method of the list rather than of the item, because is a GlobalScript
field where global stuff can be stashed, and that'll need to be bundled in with any code from the item.
Looking forward to getting into the use of JS Bridge...
I got a test app working with JS Bridge, and confirmed that:
- I can compile and execute a script from a string value.
- I can pass data into the script.
- I can retrieve data from the script.
- I can successfully pass in and retrieve WideString values.
I've discovered that there's a connector for Delphi and Mozilla's SpiderMonkey JavaScript engine, called JavaScript Bridge. This looks very promising indeed, and JavaScript as a scripting language would be more universally acceptable than Pascal nowadays. I'm still looking at examples and doing research; at the moment, the main issues are support for D2005+ (the code only goes up to D7); WideString support (I've found a cryptic compiler switch in the jsconfig.inc
file that seems like it might enable it); and licensing/redistribution (the Moz components are MPL, but it also requires MSVCR70.dll
).
I'll build a test app as soon as I get a chance.
DB from Pacific and Asian has a batch of old SCRIPT files that need converting. Plain search-and-replace won't do for this, and I don't want to create yet another custom application just for this job. It occurred to me that I might be able to build a Pascal interpreter into Transformer, so that Transformer operations could also be actual script blocks. I've done some research and testing today, and discovered that there are two candidates: the JEDI TJVInterpreter, and RemObjects Pascal Script. The former is too simple and completely undocumented. The latter would be great except that it has no packages for Delphi 2005, so I'm stuck with no GUI components or property editors.
I spent the day getting a test application organized, and got a proof-of-concept code block to run. I'm able to execute basic functions that use basic types. However, I've not yet successfully imported classes such as TStringList; when I try to do so, I get no errors, but the function produces no result. I need to figure this out, because I'll also have to be able to use TTntStringList or TWideStringList, which will mean writing my own import units and calling them on the model of the built-in ones.
However, if we get this working, Tranformer will be twice the tool it is right now; so it's worth a couple of days of hacking and learning.
Incorporated a minor tweak to the way output filenames are configured in the Batch window, making it easier to preserve the original file extension, as well as incorporating updates to the Preferences dialog box arising out of the IMT project. Updated documentation and created a new release.
Paolo Cutini reported an oddity in a tooltip hint, and also provided a new Italian translation, so I fixed the hint and built a new release.
While video was processing itself in the other room, I added the new "ASCII with numeric entities" output format to the output text area in the main screen of Transformer, then updated all the documentation, built an installer, and updated the Website for the new release.
Transformer is a great tool for working on Unicode texts, but today I hit a problem, in that I needed to work on Hot Potatoes data files, which are not actually Unicode; they're 8859-1 files with all upper-ascii characters escaped to numeric escapes.
It seemed a shame not to be able to work on those files, since the actual underlying format is Unicode (they use Unicode characters, but encode them as numeric entities). So I made a couple of changes to Transformer to make that possible. First, I analysed the file load routines: there are two, one for the source text in the main screen, and one used to load files during batch operations. These were both failing to load plain ascii files, because they appear to be UTF-8 with no BOM, but they're not. So I added a couple of lines so that, in the event of a failure to load a file as UTF-8, it will be loaded as plain ASCII and then turned into a WideString.
Secondly, I needed a way to save a file as ASCII or ANSI, but deal with any characters over 127. I added an option to the batch window to save as "ASCII with numeric entities", which escapes all characters above 127 to HTML-style numeric entity references, and then saves as ASCII.
This all seems to be working well, but it needs to be documented, and the same save function should also be added to the output text save dialog and routine, for symmetry. Adding this as a task so that I get around to doing that.
Problems with the horrible phpMyAdmin interface and unreliably-encoded tab-delimited text files eventually succumbed to a two-headed attacking force armed with two operating systems.
I thought I'd check out the Scraps project on Safari, to see how well the JavaScript components work, and I see only a blank page when I go to the link in your previous post. I also tested on Opera, and I get a blank page there too. We should organize some proper testing for Scraps on the KHTML browsers (Safari and Konqueror) and on Opera.
Paolo's bug report was absolutely correct; the "Files changed: " message was actually embedded in the code. Added a TTntStaticText component to hold it, which makes it translatable. In the process, normalized the names of two other message components to "ust[blah]" instead of "ustmsg[blah]", to match all the others in that form.
The task below completed -- turned out I needed to make an explicit call in the toolbar resize event to reposition the sequence TTntListView control. Fixed the bug, built a new installer, and released version 1.1.0.6. Also fixed links on the Transformer and Image Markup Tool sites to the project blogs (which have now changed location).
Testing on various platforms, especially on VISTA yesterday, where we set up a non-standard font/DPI setting, reveals a minor bug in the toolbar sizing in Transformer. it seems to afflict the top left toolbar (sequences) in the main screen. When icons are set to larger that 24px, the toolbar does not auto-size. This might just be the AutoSize setting being false (check all toolbars in the app), but it could also relate to the resize code handling the display of the grid component below it.
Integrated the Nuvola dll, as done previously for IMT. In the process, found a couple of other libraries common to IMT which were still linked to the original icons.pas file, so I fixed those and rebuilt the IMT (just making the executable smaller).
Also figured out a workaround for the issue of restricted characters in replace sequences reported by Paolo Cutini (they could be saved, but then the file could not be reloaded). I'm now escaping those characters to maintain valid XML 1.0.
Built an installer, did the release, updated the Website and source code archive, etc. etc.
Created a workaround for this Feb 7.
Had some correspondence over the weekend with Dieter Köhler (author of the OpenXML code I'm using to save and load files), and he confirmed that the XML 1.0 specification disallows characters in this range. The question now is how to handle situations in which people insert these characters. The XDom code is asymmetrical in that it fails to raise any error when saving a node containing illegal characters, but it does raise an error when trying to read them back in; I need to allow for this by somehow escaping these characters myself.
Posted time spent researching this and checking into my code. Also made it a task to add the relevant code to the app.
Complete: created a reasonable workaround for this on Feb 7.
Paolo Cutini is using Transformer to recover some old WordStar word-processor files, and encountered a problem reloading the sequence file he had saved. The file appears to be corrupted; a control character occurs throughout the document. oXygen reports:
F An invalid XML character (Unicode: 0x1c) was found in the element content of the document.
That character is "INFORMATION SEPARATOR 4" or "file separator". It wouldn't normally be found in a Unicode document. However, that character is in the Unicode specification, so it ought to be somehow encoded in a format that UTF-8 can handle. This may be a limitation of the XDom engine I'm using for XML file handling, it could be a bug in my code, or it could be that I should automatically exclude control characters on the basis that they shouldn't show up in a text document. I'll look into it. Transformer is intended for working on Unicode texts, rather than ancient word-processor formats, but I do like the idea of using it to retrieve this old data; the program was written as part of a project to rescue some old DOS WordPerfect data, after all.
Entering this as a task with a long deadline, because it's not a major thing; what needs to be done is to investigate the code which uses XDom to save files, and see if the file data is being correctly encoded in UTF-8; if so, look into the specs and see if UTF-8 is supposed to handle this character, and if so, whether it should be somehow encoded or escaped.
While using the program, Stewart found the following bug:
Regular expression syntax was checked in a replace pair even when "Use regular expressions" was not selected. This meant that an expression which was not a valid regular expression would be rejected and could not be used, even if it was not intended as a regular expression.
Fixed this bug, documented it on the site, built a new release package, and released 1.1.0.3. In the process, I converted the Transformer installer build system to use the Inno Setup Preprocessor like IMT does, to make future releases easier and quicker.
Did a new release of the program (1.1.0.2) incorporating:
- use of DocLauncher for Help files and Tutorial inside the app.
- setting of Scaled = False and AutoScroll = False on all TTntForms.
- bugfixes for the translation implementation arising out of work on the Image Markup Tool.
Following the development of a PDF documentation system built on the existing DocBook Help system, another release should be made, also incorporating fixes for any bugs which emerge before the end of the year.
Help for Transformer is built using a system we are developing which incorporates the Image Markup Tool (for interactive screenshots) and DocBook files. Currently this generates only an interactive HTML Help system, but it will eventually also generate printable PDF documentation. Transformer is the testbed project for this documentation system, which will be used to document all our projects in the future.
Link: http://portal.tapor.ca/news-feeds/2.rss/156
The first full version (1.1) of UVic's Transformer open-source Unicode search-and-replace tool has been released.
Back in February, we announced a beta version of this application; the full package is now complete, including documentation and source code, and is available from here: http://www.tapor.uvic.ca/~mholmes/transformer/ Transformer was created as part of a project to rescue some very old linguistics data, which was stored in a combination of Lexware and DOS WordPerfect files, by converting it to Unicode. Non-ascii characters were represented in the data by nasty sequences of control characters used to switch between obsolete character-sets and long-gone fonts in WordPerfect. In order to convert the data, we had to create and test a huge sequence of search-and-replace operations which would find these strings and replace them with the correct Unicode codepoints for IPA characters. To make this process easier for ourselves, we created a Transformer, a Windows application which enables you to create, organize and test sequences of search/replace operations (including regular expressions), then run them in batch mode on a set of files. It is released as open-source under the MPL 1.1.
November 1 2006:
- Added a Help topic for the Translation screen (interactive, with IMT).
- Added an Acknowledgements topic (DocBook XML directly coded).
- Began testing the output across various browsers.
- Determined that IE7 still has the bug relating to the absence of text in a div -- can only respond to onmouseover if there is text.
- Added some text to mitigate this, but it's impossible to make the text large enough and position it correctly to make it work well. This will have to be studied in more depth.
- Determined that Safari and Konqueror have a bug related to positioning, so offsets of click areas are wrong.
- Discovered that offsetTop and offsetLeft are calculated differently by the different browsers -- some are relative only to the offsetParent, which may be the doc and may be the actual parent. Adapted an function from the Web for calculating it recursively, which fixes that problem.
- Found another problem related to the hash in the location.hash property; Safari and Konqueror may sometimes double it. Worked around that.
One more problem with Safari not finding the data it needs to show in a popup. The problem may relate to this:
- I'm using innerHTML to copy the contents of a node to another node.
- The contents which are copied contain elements with unique ids.
- Therefore the copy operation will create duplicate ids.
This might be a solution:
- Move elements which are to be displayed as popups, rather than copying them.
- Move them back to be children of the body element and set them to display: none; before moving another element into the popup box.
- This will change the basic document structure, but none of the JavaScript will care at this point, since pointers have already been harvested on load.
- Tested on IE6 and found it's close enough to working to make it worthwhile to hack the popup code and make the popup position itself correctly in IE6. Will also do this tomorrow.
November 2 2006:
- Rewrote the DOM code to avoid innerHTML and intelligently move elements around instead of copying them.
- After proving the concept works, encapsulated this code in a DisplayHost object which can handle popups automatically.
- Tested on all browsers -- OK!
- Decided it's worth trying to support IE6, so added some special handling to make the popup work on IE6 too.
- Tested -- working.
- Began working on the problem of empty div areas not being clickable in IE. No decent solution, so we currently have a hack which does the best job possible but still leaves the edges of divs unclickable. Sad, but unavoidable; IE is crap all round (even 7, which has the same bug).
- Updated the source code on the server, and documented mdhHelp.pas library.
- Built an installer, and tested it. Seems OK.
- Tested the Help invocation system when IE is the default browser. It launches the browser, but fails to add the hash to the URL, so it doesn't navigate to the context. Will have to work on this.
- Updated the source code on the server, and added new description files where required.
November 3 2006
- Tested help system on IE6 and IE7. 6 is OK, but 7 can't handle the hash in the path which directs to a specific topic, so it's hopeless for context-sensitive help. Tested on Opera and that works fine, so:
- Rewrote the Help launching system so it looks specifically for Firefox, and failing that for Opera, before falling back to the default handler for .htm. That means that if Opera or FF is installed on the system, they will be used in preference to IE.
- Reworked the Transformer Website. In the process, added more files to the Help system, and more items to the glossary.
- Rebuilt the Help and tested it.
- Built the installer, and tested it.
- Released the application by updating the Website information.
- Posted a topic to the TAPoR news thread.
- Created an updated description of the project for the new HCMC site, and posted an inc file that can be pulled in to the HCMC projects page.