Notes on compression and the universal similarity metric
With the proposal for DH accepted, I'm now planning an implementation of the application that I could use to demonstrate and work with real data, and wondering what platform to use. I've already written one implementation in Delphi (not acceptable as a proper solution, because it's not cross-platform), and written a command-line implementation of the basic similarity metric in Java (easy because zip support is built into Java). But I'm now wondering if the benefits of compiled code, and the opportunity to build my QT skills up, merit doing this project in QT, assuming there's time before July. This is why I've decided to try this project in QT:
- QT includes qcompress, which takes a bytestream and returns a compressed bytestream, using zlib. This should be perfect for our purpose.
- C++ is going to be MUCH quicker for this sort of processing.
- I need more practice with QT and C++.
- A cross-platform native app will be as acceptable to the community as Java.
I've set up a new QT project for this, and I'm going to start coding the basic class for zipping, measuring, and calculating the similarity metric over the next couple of weeks.