Transformer logoTransformer Help

A guide to using the Transformer program

Edition: 2.0 (2009)

Martin Holmes (mholmes@uvic.ca)

Table of Contents

  1. Introduction to Transformer
  2. Installing and uninstalling Transformer
  3. What's new in this version?
  4. The main screen of Transformer
  5. The Main Screen (Sequence area)
  6. The Main Screen (Source Text area)
  7. The Main Screen (Output Text area)
  8. The Replace Pair dialog box
  9. The Script Item dialog box
  10. The Batch Processing Screen
  11. What file types does Transformer handle?
  12. The Preferences dialog box
  13. The File Overwrite confirmation dialog box
  14. The interface transation screen
  15. Acknowledgements
  16. How to use this help file
  17. Glossary
  18. Index

Introduction to Transformer

Transformer is a somewhat specialized utility for doing complex transformation operations on multiple Unicode text files.

Transformer loads Unicode files (UTF-8 or UTF-16) and performs sequences of search-and-replace or script transformation operations on them. It provides you with an interface to create and test these sequences of operations before running them in batch mode on a set of files.

We wrote version 1 of Transformer to assist in the rescue of textual data from obsolete file formats such as DOS word-processor files. Sometimes, it proves impossible to rescue old data through normal means such as running the original application and exporting to a format which can be imported by a more modern program. Perhaps the program which created the data no longer exists or cannot be run on available hardware, or perhaps it cannot save to any format but its own. In these circumstances, it is often necessary to rescue the data by a process of manual conversion, identifying blocks of text or other data in the original file which can be replaced with Unicode characters, and building sequences of replacement operations which gradually convert the data. This often requires an exhaustive trial-and-error approach in which replace operations are added incrementally and tested in various sequences before a reliable sequence is developed; then the sequence can be run on a group of files. Transformer should provide as efficient a working environment as possible, with very fast completion time for replace operations, so that the user can focus on the task itself rather than the machinery.

After completing version 1, we began to use Transformer on more complex and varied projects, and determined that it would benefit greatly from being able to do more than simple search-and-replace. Version 2 of Transformer therefore includes a scripting capability, using ECMAScript, A.K.A. JavaScript. This functionality is based on the excellent Mozilla SpiderMonkey JavaScript engine, which is bundled with Transformer as a dynamic link library. The source text is supplied in the form of a JavaScript string variable, which you can modify using script code; then you return the result in the form of another JavaScript string variable.

Installing and uninstalling Transformer

Transformer is distributed as a self-extracting installer setup_transformer_XXXX.exe, where XXXX represents the version number. To install the program, simply run the file by double-clicking it. The installer gives you the option of installing all the program source code; you would only want to do this if you are a Delphi programmer thinking about contributing to the project.

To uninstall Transformer, use the Windows Add/Remove Programs Control Panel applet.

To upgrade to a newer version of the program, we recommend uninstalling your previous version before installing the newer version.

What's new in this version?

Version 2.0.0.0 introduces the following changes:

Version 1.1.2.0 introduces the following changes:

The following changes were introduced between version 1.1.0.6 and version 1.1.0.8:

The main screen of Transformer

The main screen is divided into three areas, the Sequence area, the Source text area, and the Output text area.

The Main Screen (Sequence area)

                                               
                                               
                                               
                                               
                                               
                                               
                                               
                                               
The Main Screen (Sequence area)
Click on the image to learn about the features of this screen.

Main menu items

The main menus in this screen give you access to all the commands available. Each area of the screen has its own custom toolbar containing a small subset of these commands.

File commands

These buttons allow you to create, load and save sequence files. Sequence files are collections of search/replace and script operations, stored in XML format.

Transformation operation controls

These toolbar buttons allow you to add new search/replace or script operations, delete selected operations, and move operations up and down in the sequence. When you add or edit an operation, the program will show the Replace pair dialog box (for a search/replace operation), or the Script item dialog box (for a JavaScript operation).

Transformation operation listbox headers

You can click on the header elements in this listbox in order to sort the operations alphabetically, based on the header. For instance, if you click on the Name header, the sequence will be sorted based on the name you have assigned to each operation. Be careful with this feature; the sequence in which transformation operations are done is often critical to the success of the overall operation, so it may be important to maintain a working sequence.

You can also drag items around in the sequence to re-order them, or use the up/down arrow buttons to move them.

JavaScript operations have only "[Script]" in both the Find and Replace with columns, so they should always sort together when you click on these column headers.

Transformation sequence items

Each item in the listbox represents one search/replace or JavaScript operation. A replacement operation has a name (an arbitrary description you give it, for your own purposes), a "Find" string, a "Replace with" string, and a checkbox representing whether the item is "turned on" or not. Script operations also have a name and checkbox, but they all have "[Script]" in the other two columns, because the operation of a script is not predictable. If the Checked items only checkbox at the bottom of the screen is checked, then items in the list which are not checked will be ignored when the sequence is run.

Do transformations button

When you click on this button, the sequence of operations listed above will be run against the source text in the source text box on the right, and the results will be placed in the output text box.

Checked items only

If you want to restrict the transformation operations which are run when you press the Do transformations button, you can check this checkbox, and then click the checkbox next to each of the operations you want to run. Unchecked operations will then be ignored.

Current sequence file

The path to the current sequence file is shown in the status bar.

The Main Screen (Source Text area)

                                               
                                               
                                               
The Main Screen (Source Text area)
Click on the image to learn about the features of this screen.

Toolbar buttons

The Source Text toolbar buttons provide only two functions, both for loading files. The first allows you to load a text file from disk, and the second will load a binary file. In most cases, you should use the first option, but you may occasionally need to do search-and-replace on a binary file. When binary files are loaded, problem bytes such as control characters are converted into human-readable numeric representations; for instance, a byte with a value of 11, which represents a vertical tab, is converted to . Of course, once you convert a binary file in this way, it cannot function as a binary file any more; it is essentially a text file.

Current source text file

The location of the current source text file is shown in the small status bar below the source text display.

Source text display

The source text is displayed in the text area at the top right of the main screen. You can edit the source text here, with no risk of overwriting the original file, because there is no mechanism for saving changes.

The Main Screen (Output Text area)

                                               
                                               
                                               
                                               
                                               
The Main Screen (Output Text area)
Click on the image to learn about the features of this screen.

Save output file

This toolbar button allows you to save the output of your transformation operations to a file.

Copy output to input

If you click on the up-arrow button in the output text toolbar, the contents of the output text box below it will be copied into the source text area above, thus becoming the source for any future replacement operations.

Escape/Unescape non-ascii characters

Although Transformer is fully Unicode-capable, it is sometimes useful to be able to convert non-ascii characters into their hexadecimal escape characters so that they can be processed by non-Unicode systems. Pressing the first button will escape any character above #127 to a hexadecimal escape sequence, and pressing the second button will convert numerical escapes back to Unicode characters.

Output text area

The results of your transformation sequence will be shown in the output text area. You can save these results using the Save toolbar button above it, or the corresponding items on the File menu

Saved file

Once you have saved the output to a file, the path to the file will appear in the status bar below the output text box.

The Replace Pair dialog box

                                               
                                               
                                               
                                               
                                               
                                               
The Replace Pair dialog box
Click on the image to learn about the features of this screen.

Name for replace pair

Each "replace pair" in your sequence can be given a distinct name to help you remember exactly what it does. You can enter anything you like for the name.

Text to find

In this text box, enter the sequence of characters you would like to replace.

Ignore case

If you want to replace both upper- and lower-case versions of your text, check this checkbox.

Use regular expressions

If you know PERL regular expressions, you can enter a search string which uses them, and check this checkbox. Please note that the open-source regular expression engine used by Transformer is a fairly primitive library (TURESearch, part of the JEDI Code Library) and does not support all PERL regexp syntax.

Replacement text

Type or paste the replacement text you want to use into this text box.

OK and Cancel buttons

Pressing OK will save the details of your find/replace pair; if you press Cancel, the previous settings for this pair will remain unchanged.

The Script Item dialog box

                                               
                                               
                                               
                                               
                                               
                                               
The Script Item dialog box
Click on the image to learn about the features of this screen.

Name for script item

Each replace pair or script item in your transformation sequence can be given a distinct name to help you remember exactly what it does. You can enter anything you like for the name.

Code editor

This is where you type your JavaScript code. The editor uses syntax highlighting and has line numbering to help you edit the code. The instructions at the top are important: the source text arrives in a string variable called JSInput, and you store your transformed version of it in a second variable called JSResult. You can check your code for syntax errors using the Check code button; the JavaScript engine will try to compile your code, and give you feedback on any errors it finds, with line numbers where appropriate.

OK and Cancel buttons

Pressing OK will save the script; if you press Cancel, the previous settings for this script will remain unchanged. Before saving, the program will syntax-check the JavaScript, and give you an error message if it finds a problem. You cannot save script code which has syntax errors.

Check code button

You can check your code for syntax errors using the Check code button; the JavaScript engine will try to compile your code, and give you feedback on any errors it finds, with line numbers where appropriate.

Local code and Global code tabs

The Local code tab shows the code editor containing JavaScript which is local only to this script operation. In simple operations, this will be all you need. However, on a more complex project, you may want to store functions, classes etc. which you can re-use in multiple script operations. If you click on the Global code tab, you'll see another code editor. Any code written in this editor is global to the project, meaning that it's always available for any individual script operation to use. Global code is also left in place when you create a new sequence; more often than not, you'll want to re-use at least some global script code across projects.

Instructions

Pay careful attention to the instructions here. When the script operation runs, the contents of the source text are provided to it as a string variable called JSInput. Your script code should operate on this variable to perform whatever transformation it needs to do. When your operations are complete, you should store the results in a variable called JSResult. The program will take the contents of this variable and feed them to the next operation in the sequence.

The Batch Processing Screen

The Batch Processing Screen

The Batch Processing screen allows you to apply the replacement sequence which is open in the main screen to a list of files (a "batch") all in one operation, saving the results to disk. The top half of the screen shows the list of files which will be changed. You can add or remove files from the list using the plus and minus buttons on the toolbar.

At the bottom of the screen is a tabbed interface where you can choose settings for your operation. The first tab, Saving files, allows you to choose an output encoding for the files that will be saved. The second, Save location, lets you choose where to save the new files. In the third tab, Output filename, you can establish a pattern for naming the new files, based on the original filenames. If you simply want to overwrite the original files, you can choose to save in the same location and with the original filename; however, using a different location and/or filename allows you to preserve the original files in case something goes wrong. You can also guard against problems by setting a backup location in the Backup tab.

After choosing a list of files and setting your preferences, press the Go! button to start the batch operation. The program will show its progress as it works through the files, and report back at the end of the process with details of how many files were changed, and how many replacements were made.

If you are going to make use of the same batch operation regularly, you may want to save it as a file. You can do this using the commands on the File menu or the equivalent toolbar buttons.

What file types does Transformer handle?

Transformer works only with Unicode text; when a file is loaded, it will be turned into a stream of 16-bit characters internally (32-bit Unicode is not supported at this point). When loading a file, this is how the program decides what to do:

UTF-8 files without Byte Order Marks are common, because some systems and applications cannot handle byte-order marks. If you know your files are UTF-8, but they don't have BOMs, then you can add a BOM to them in the following way:

All your files will have a UTF-8 BOM added to them, so the program will definitely understand how to read them. You can remove BOMs from UTF-8 files in a similar way, by pressing Control + Alt + C.

When you do batch transformations or save files from the Output Text box, you have the option of adding BOMs to your UTF-8 files or not. Whether you do so is up to you, and depends on what you're going to do with the files later. If you know you will NOT be using the files in contexts where the BOM will cause problems, then it's recommended that you add a BOM.

If you want to use Transformer on ASCII files, and get ASCII files as output, then simply choose to save them as UTF-8 without a BOM. The first 127 characters in UTF-8 are saved as single-byte characters, and are the same as the ASCII set, so an ASCII file containing only these characters is identical to the same file in UTF-8 without a BOM. If you want to work on ANSI files, and keep them in ANSI format, then Transformer is not the right tool for the job.

You have one final option when saving files: ASCII with numeric entities. This saves files in ASCII format, by converting each character with a codepoint above 127 into a numeric entity (such as é for codepoint 233, the e-acute). This preserves the values of Unicode characters, and if the file format is HTML or XML, then they will remain accessible because numeric entities are supported in these formats.

The Preferences dialog box

The Preferences dialog box The Preferences dialog box

The Preferences dialog box enables you to control the environment of the program. You can set the fonts for two different types of element:

You can also choose the length of time for which tooltip hints are displayed, in seconds (set this to zero to turn off tooltips completely), and you can choose between four different sizes for the button images displayed on the application's toolbars. Finally, you can load an interface file to change the entire interface of the program to another language.

Once you have selected your preferences, you can press the Preview button to test out your choices. The interface of the program will be changed according to your selections, but if you decide that you don't like the result, you can simply press the Cancel button to undo the changes.

The File Overwrite confirmation dialog box

The File Overwrite confirmation dialog box

The file overwrite confirmation dialog box should appear whenever you are about to do an operation which involves saving multiple files, where some of those files already exist. Normally, when you save a file to disk, if the file already exists you will see a simple dialog box which asks you whether you want to overwrite it or not. When the operation involves multiple files, a more complex dialog box is needed.

When the dialog appears, it will show all of the files you're about to overwrite in a list, with each one checked. If for some reason you don't want to overwrite a particular file, just uncheck it in the list. Then you can press OK to continue with the operation. If you want to cancel the operation completely, press Cancel.

The interface transation screen

                                               
                                               
                                               
                                               
                                               
                                               
                                               
                                               
                                               
                                               
The interface transation screen
Click on the image to learn about the features of this screen.

Menus

The File and Edit menus give access to the same functions which are available from the toolbar.

File commands

Using the file commands on the toolbar and on the File menu, you can create a new translation file, load a previously-saved file, and save the file you're working on at the moment. A translation file is an XML file which contains strings of Unicode text for all the labels, captions, hints and titles in the program.

Edit commands

Standard edit commands are available when you edit text in the text boxes on the right of the screen.

Close button

Press this button to close the translation window and return to the main application window.

Controls in the program

The tree control on the left of the screen gives you access to the structural hierarchy of the program. Each node in the tree represents one object in the program, such as a form (= a window), a button or a label. Some objects contain other objects (for instance, forms contain buttons and labels), so the structure is hierarchical. These objects are referred to as items. When you click on a node in the tree, if it has translatable text associated with it, the text will appear in the text boxes on the right. There, you can replace it with a translation, then press OK to store your new text.

Not all items in the program hierarchy have text attached to them. Some will have a title but no hint, and others may have a hint but no title. When translating the interface, you only need to enter translations into the boxes which already contain English text.

Item hint

The hint property of an item is the text which will appear as a tooltip when your mouse hovers over it. For instance, a button in the program may have a short caption such as OK, but if you put your mouse over it you might see a little popup which says "Accept these changes". That text is the hint.

Item caption

Menus, buttons and other clickable controls have captions. You will see that some captions have an ampersand (&) character in them; the effect of this is to make the following character into a hot key, which is underlined in the caption. For instance, if the caption is &OK, then the button will have the caption OK, and pressing Alt + O on the keyboard will cause the button to be pressed.

Item title

A few items, mainly dialog boxes, will have a title attribute which shows up at the top or in the title bar.

OK and Cancel buttons

When you have made changes to the text in the text boxes above, you can store your results into the tree structure by pressing OK, or revert to the original text by pressing Cancel.

Note that this does not save your changes to a file on the disk; it merely stores them in memory. To save your changes to disk, use the Save commands on the toolbar or the File menu.

Find function

If you're searching for a particular piece of text in an item hint, caption or title, you can type it in the text box above the Find button and click on Find or Find Next to search for it.

Acknowledgements

Transformer was coded by Martin Holmes from the University of Victoria Humanities Computing and Media Centre, using Borland Delphi 2005. The program was created in close collaboration with Greg Newton, also from UVic's HCMC, wh8ile working on a project to rescue a large amount of old data from a linguistics project, stored in WordPerfect/Lexware format.

The following people contributed interface translations:

The following open-source libraries and controls are used in the project:

Transformer also uses, Microsoft's msvcr70.dll, which is required by the JavaScript Bridge. This is not open source, of course, but it's widely available online, and anyone with an appropriate licence for a Microsoft development tool should be able to distribute it (we have a copy of Visual Studio).

How to use this help file

This help file is an XHTML Web page that runs in your browser. If you have a modern, standards-based browser, all its functions should be available; if you have an older browser, or a browser which does not support standards properly, then it may not function so well.

To access the Help file, you can press the F1 key in the application any time; the browser should start up, and the Help file should open, showing the appropriate topic for the area of the application you are using. If there is no particular appropriate topic, it will open at the table of contents.

To search the Help file, you can either look through the Index, or you can use the search capabilities in the browser, by clicking on the Show All button to reveal all the topics, then pressing the appropriate key combination to launch Search in your browser (usually Control + F on Windows).

Glossary

Index