Published on March 21, 2025

Asset loading (Part 2)

When last we saw our hapless hero, he had conquered unformatted input and .TTF files (no, not those .TTF files).

Next up on the big list of asset formats is .CMP. Not a particularly helpful extension... maybe something CoMPressed?. The original source code loads these files in a function called LoadCompressedTextureSequence(), so yes, there is compression involved!

These files consist of a header (below) and a compressed blob of texture data.

struct CmpHeader{
  int32_t num_textures;
  // This has `num_textures` entries. Each of them is the
  // **uncompressed** size of a texture. Having this information
  // allows us to allocate the right amount of memory to hold the
  // uncompressed textures.
  int32_t texture_sizes[];
};

So that's our input. What's the output? A TIMList!

constexpr size_t kMaxTimTextures = 400;
struct TIMList
{
  std::unique_ptr<void, AllocatorDeleter> mem_base;
  size_t mem_base_size = 0;
  uint8_t* mem_offset[kMaxTimTextures];
  size_t mem_offset_size = 0;
};

... which doesn't tell us a whole lot, it's basically just a pointer to a chunk of memory, with a bunch of offsets into that memory. We'll get to what a TIM is later, but for now, I'm guessing Texture IMage.

But before that, we need to decompress the TIMs! The original code does that in a function called ExpandData(), which lives in a file helpfully called LZSS.CPP. This is cool, it's time for me to finally learn what LZSS actually is! I've seen the acronym bounce around when reading about file compression, but never had to care about how it actually works... but now I do!

First things first, LZSS are just the initials of the people involved: Lempel-Ziv-Storer-Szymansk. And right off the bat, I get to right a misconception that I held: LZSS is not a compression format, it's a compression algorithm.

What's the difference, you ask?

Well, if I find a random .zip file, I can go look up the ZIP format specification, implement that, and my program will be able to unpack that file. In other words, the file (in a header, and in .zip's case, also a footer) and specification contain all the information I need.

Not so for LZSS. The LZSS algorithm, as we'll get into, has a number of parameters, and LZSS-compressed data does not include a header (or footer) to tell us what those parameters are. Without knowing the parameters (and implementation choices), it's impossible to unpack our data.

Lucky, then, that we have the source code, which tells us all of those things!

I'm not going to paraphrase the entire Wikipedia page here, but in a nutshell, LZSS works by having a "look-behind" window, and represents data as either

literal bits, if those bits do not appear in the current window
an offset into that window and length to read, i.e. "you've already seen the next 24 bits, and that was 64 bits ago"

Where it gets a bit tricky, and somewhat unusual, is in how LZSS distinguishes between those two cases: By using a single bit as a flag that indicates whether the next chunk is a literal byte or an (offset, length) pair. This means that we're reading data in either 9-bit (1+8) or 18-bit (1+13+4, I'll explain those numbers later) chunks, which is not byte-aligned.

But computers really like things being byte-aligned :( So much so, in fact, that there's variations of LZSS that use a whole byte as the flag to avoid being unaligned. The version that wipEout uses doesn't do that though, so we'll have to do a bit¹ of gymnastics to read things bit-by-bit¹.

Back to those parameters for a second, though. What are they? How do they work?

There's three main ones:

The number of bits to use for the offset (or index) into the look-behind window.

This also dictates the length of the look-behind window: If there are n offset bits, we can only possibly use them to index 2^n bits, so that's how big our window should be.
The number of bits to use for the length of a match.
The minimum size of a match. This allows us to indicate longer matches using fewer bits by simply adding the minimum size to the decoded match length. It also helps us avoid storing matches that are so short that it would have been better to just store a literal byte instead.

In the wipEout source, these numbers are:

constexpr uint8_t kNumIndexBits = 13;
constexpr uint8_t kNumLengthBits = 4;
// 2^13 = 8192 bytes
constexpr uint16_t kWindowSize = 1 << kNumIndexBits;
constexpr uint8_t kBreakEven = ((1 + kNumIndexBits + kNumLengthBits) / 9);

There's some more minutia, like whether the index is a relative offset from the last byte in the look-behind window, or an absolute position in that window, and where to start writing into said window², but I won't bore you with that. Instead, I'll bore you with the mechanics of reading bit-by-bit from a buffer that consists of bytes!

All we need is some masks, bitwise arithmetic, and patience:

std::vector<uint8_t> input_buffer;

// Always start at the leftmost bit of each byte
constexpr uint8_t kMaskStart = 1 << 7;
uint8_t input_bit_mask = kMaskStart;
size_t input_byte_index = 0;
uint8_t current_input_byte = 0;

while(true) {

  if (input_bit_mask == kMaskStart) {
    // Read a new byte from our input buffer,
    // and avance input_buffer_index.
    current_input_byte = input_buffer[input_buffer_index++];
  }

  // Read one bit (this tells us whether to read a literal byte
  // or an (index, length) pair), then advance the input mask.
  bool read_literal_byte = current_input_byte & input_bit_mask;
  input_bit_mask >>= 1;
  if (input_bit_mask == 0){
    input_bit_mask = kMaskStart;
  }
  if (read_literal_byte) {
    // We encountered a 1-bit in `current_input_byte`, so we read 8 bits
    // and write them to `output_buffer`.
    //
    // Note that these bits **are not** necessarily byte-aligned
    // (because the other branch of this if statement might read a
    // number of bits that's not a multiple of eight), so we might
    // have to read a new byte from `input_buffer` along the way.
    uint8_t aligned_input_byte_mask = kMaskStart;
    uint8_t aligned_input_byte = 0;

    // We shift `aligned_input_mask` to the right in the loop
    // below, so this terminates after we've read all 8 bits
    // of the input byte!
    while (aligned_input_byte_mask != 0) {
      if (input_bit_mask == kMaskStart) {
        // We may need to fetch another byte from the input buffer
        // we read bits.
        current_input_byte = input_buffer[input_buffer_index++];
      }

      // If the current bit from the input buffer is set,
      // then we also set the corresponding bit in
      // `aligned_input_byte`.
      if (current_input_byte & input_bit_mask){
        aligned_input_byte |= aligned_input_byte_mask;
      }
      aligned_input_byte_mask >>= 1;
      input_bit_mask >>= 1;

      if (input_bit_mask == 0) {
        input_bit_mask = kMaskStart;
      }
    }
    // Insert the byte that we just read into both our output 
    // and the look-behind buffer.
    output_buffer[output_buffer_index++] = aligned_input_byte;
    window[current_window_index] = aligned_input_byte;
    // The look-behind buffer is a ringbuffer!
    current_window_index = mod_window(current_window_index + 1);
  } else {
    // Code for handling (index,length) pairs goes here
  }
}

Got all that? Maybe I should add some nice (animated?) illustrations of the buffers and masks here, but, well, that would require me to write JavaScript...

What's important is, we have LZSS decompression now. But how do we test it? We don't know what the uncompressed data relly looks like, after all.

That said, we do know the exact length of the uncompressed data, from the CmpHeader! And given how LZSS works, it's kind of unlikely that we incorrectly expanded a compressed block to the correct size.

So, since we don't know how to actually parse a TIMList into anything useful just yet, the best we can do is check the uncompressed length against the header:

TEST(LoadCmp, LoadsCmp) {
  TestAllocator a;
  TIMList list = LoadTimListFromCmp("test_data/test.cmp", a);
  // Not much to validate here. The load function checks for
  // consistency (unpacked LZSS block size vs expected total texture
  // size), which is good enough for now.
  ASSERT_NE(list.mem_base, nullptr);
  ASSERT_EQ(list.mem_offset_size, 275);
}

I'll probably come back and expand this once we do know what a TIMList really is, but for now, this is good enough.