U4 Formats: LZW

LZW stands for the brilliant guys who invented this file compression algorithm. If you goole, you will be able to find lots of articles and lecture notes about LZW format. It is a little difficult to figure out the details, but since it’s such a important algorithm, it’s worth to study about it.

The short description of LZW is that, when this LZW thing compresses an original file, it generates a kind of look-up table, and it stores the index of the look-up table. However, LZW encoder doesn’t save the look-up table itself – in only saves the index data (which is now the compressed data) that refer to the table. With clever thought, when the LZW decoder reads the index data, it can regenerate the look-up table while decompressing the data.

There’s one thing to note though. After decoding the Ultima IV’s LZW compressed pictures, the resulting data becomes the Raw data. For some reason, I thought the uncompressed data should be RLE data, and so I wasted hours before I realize it was not RLE data but Raw data.

For the look-up table, LZW algorithm uses hash table. You can google about hash table, and you can study about it if you really want, and actually I do encourage you to do that, but studying hash table doesn’t guide you anywhere when we talk about Ultima IV format. Why? Because Ultima IV uses a very unique and peculiar kind of hash table. Among others, Marc Winterrowd is the guy who figured out Ultima IV’s LZW algorithm. Even the xu4 project uses his code directly. I’m not 100% sure, but I think he dived into the source code, if not the compiled executalble file itself. Otherwise, it won’t be possible to reverse engineer the algorithm – it is so much strange algorithm the Origin Systems (the company that made Ultima IV) used.

Anyway, this is the end of the post. No python code today, because Ultima IV’s LZW is, again, way too complicated to explain with a few words.

Leave a comment