Optimizations in ZX0 compression

17 Jul 2025
Amiga Style
Demo Scene

As mentioned over at Bluesky, I’m currently on sick leave and taking the time to make myself more familiar with ZX0 compression – also in preparation of another 512 byte entry for Nordlicht, maybe. :)

Thankfully, ZX0 author Einar Saukas has provided reference implementations in several programming languages (C, Kotlin, Java), and with that, I was able to quickly peek what’s happening behind the scenes, gather diagnostic logs, etc.

Summary: I saved two bits for my use case, and two bytes for everyone!

Edit: The 68000 decompression routine referenced below is a port by Emmanuel Marty, not a reference implementation by Einar Saukas. Thanks to both of you for your work!

That’s a long end marker!

One thing that has bothered me was that my compressed data always ended with the bytes 0x5556 or 0xAAAB. Can’t this be shorter?

This has to do with how ZX0 marks the end of the data stream.

ZX0 has three modes:
- Output the next n bytes literally (uncompressed)
- Copy n bytes from last position
- Copy n bytes from a specific position
ZX0 knows two flavors of bytes, so to speak:
- Whole data bytes (uncompressed, direct data)
- Bit-packed bytes (control bits and specially-encoded values)
- Both are interspersed in the compressed data
- ZX0 will try to cram as many bits as possible into each non-data byte; you can think of all non-data bytes together as a single stream of bits
Byte lenghts and offsets use a Elias gamma coding scheme. This way, common short values use fewer bits in the control data stream. For example:
- 1 = 1
- 2 = 011
- 3 = 001
- 4 = 01011
- 5 = 01001
- 6 = 00011
- 7 = 00001
- 8 = 0101011
- etc.
The “copy n specific bytes” mode uses two values to calculate the start offset:
- An Elias-coded value (variable bit length)
- An unchanged, literal byte taken from the data (8 bits)
- Combined into eliasValue << 8 | byteValue
When the “copy n specific bytes” mode encounters an Elias value of 256, this marks the end of the compressed stream. (The compressor outputs that value when it’s done.)
The value 256 is where the 0x5556 and 0xAAAB values come from.
- 256 encodes to 01010101010101011
- That’s 0xAAAB in hexadecimal
- Or 0x15556 when shifted one bit to the left

Curious, I used the decompression code to trace what other Elias values are used in my data for the “copy from” position, in addition to the end marker. I only saw 5 distinct values being used:

Elias value	Decoded	Meaning
`1`	1	Copy n bytes from position `0x01..`
`011`	2	Copy n bytes from position `0x02..`
`001`	3	Copy n bytes from position `0x03..`
`01011`	4	Copy n bytes from position `0x04..`
`01010101010101011`	256	End marker

So if my data only uses start positions up to 0x04.. – could I just use a smaller value for the end marker? Maybe 7, as it uses the same number of Elias bits as 4 and would leave some leeway.

01010101010101011 = old end marker, 256
00001 = new end marker, 7
12 bits saved!

But I couldn’t just replace the new end-marker value in the 68000 decompression code. It doesn’t even use the value 256 directly, but rather checks if the byte portion ends up as 0x00 (the beq branch is taken, then):

.get_offset:   moveq #-2,d0         ; initialize value to $fe
               bsr.s .elias_loop    ; read high byte of match offset
               addq.b #1,d0         ; obtain negative offset high byte
               beq.s .done          ; exit if EOD marker

Bummer! I did, however, change the beq to a bge check (greater or equal than 0, signed). For this check to succeed, we only need an Elias value of 129 instead of 256:

End marker	Effect of addq.b #1	End detection
256	`0xFFFFFEFF` becomes `0xFFFFFE00`	beq is true
129	`0xFFFFFF7E` becomes `0xFFFFFF7F`	bge is true

This limits the copy-from offsets to the 0x80.. range (acceptable for my data), but sadly, 129 is only 2 bits shorter when Elias-encoded.

01010101010101011 = 256, using 17 bits
010101010101001 = 129, using 15 bits

When the stars align right, we might chop off the last byte of our compressed data – in case it was only used for the lower Elias bits of 256 before. Not a huge win for all that hassle…

Needs more research: Maybe there’s another way of encoding the “end of stream” condition in 68000-assembly-friendly form. Especially when it can be be tailored and hard-coded to my specific data. For now, I’m happy with the 2 bits saved! :)

An unexpected save

The bigger surprise came when I was looking at the 68000 decompression code some more.

In the loop to copy literal bytes, it does this:

.literals:  bsr.s .get_elias   ; read number of literals to copy
            subq.l #1,d0       ; dbf will loop until d0 is -1, not 0
.copy_lits: move.b (a0)+,(a1)+ ; copy literal byte
            dbf d0,.copy_lits  ; loop for all literal byte

This gets assembled to:

.literals:  bsr.s .get_elias   ;  6136 
            subq.l #1,d0       ;  5380 
.copy_lits: move.b (a0)+,(a1)+ ;  12d8 
            dbf d0,.copy_lits  ;  51c8   fffc

That dbf instruction is huge, requiring an extra word to encode the branching target (here: 0xFFFC = -4 signed).

Now, for short loops a dbf instruction can be replaced with a subq and a conditional branch, like this:

dbf code	subq version
; ; d0 = # of iterations ; subq.l #1,d0 .loop ; ; do stuff ; dbf d0,.loop ; 51c8 fffc ; ; d0 is -1 now ;	; ; d0 = # of iterations ; subq.l #1,d0 .loop ; ; do stuff ; subq.w #1,d0 ; 5340 bge.b .loop ; 6cfa ; ; d0 is -1 now ;

dbf code

subq version

      ;
      ; d0 = # of iterations
      ;
      subq.l #1,d0
.loop ;
      ; do stuff
      ;
      dbf d0,.loop ;  51c8   fffc 

      ;
      ; d0 is -1 now
      ;

      ;
      ; d0 = # of iterations
      ;
      subq.l #1,d0
.loop ;
      ; do stuff
      ;
      subq.w #1,d0 ;  5340 
      bge.b  .loop ;  6cfa 
      ;
      ; d0 is -1 now
      ;

Of course, that’s still the same size – both versions take up 4 bytes.

As noted in the decompressor code comment, a dbf will always loop until the counter register becomes -1. If we want 23 loop executions, we need to put 23-1 = 22 into register d0. (Hence the subq #1 before the loop.)

With subq and a branch, however, we can change the branch condition to end at zero. We’ll be getting 23 executions when d0 is 23, and don’t need to subtract 1 before the loop! Only difference: After the loop, d0 will contain 0, not -1.

Is it important if d0 ends up as 0 or -1? Let’s check what happens after the loop!

.literals:  bsr.s .get_elias   ; read number of literals to copy
            subq.l #1,d0       ; dbf will loop until d0 is -1, not 0
.copy_lits: move.b (a0)+,(a1)+ ; copy literal byte
            dbf d0,.copy_lits  ; loop for all literal bytes

            add.b d1,d1        ; read 'match or rep-match' bit      
            bcs.s .get_offset  ; if 1: read offset, if 0: rep-match

The code either continues or jumps elsewhere. This is what happens in each case:

“Match or rep-match” = 0	“Match or rep-match” = 1
add.b d1,d1 bcs.s .get_offset ; ; branch not taken ; .rep_match: bsr.s .get_elias ... .get_elias: moveq #1,d0 ...	add.b d1,d1 bcs.s .get_offset ; ; branch taken ; .get_offset: moveq #-2,d0 ...

Turns out d0 will always be overwritten! Thus, we can rewrite the loop like so:

            subq.l #1,d0       ; 5380
.copy_lits: move.b (a0)+,(a1)+ ; 12d8
            dbf d0,.copy_lits  ; 51c8 fffc
            subq.w #1,d0       ; 5340
            bgt.s  .copy_lits  ; 6efa

Cool! We just removed unnecessary code from the decompression routine that has been sitting there the whole time! Everyone using unzx0_68000.S can save two whopping bytes now.

That can make quite the difference in a 512 byte intro! :)

I need to stop here for now, as I’m getting more excited than my doctor would recommend. Time for a nap, and let’s hope I didn’t overlook anything here!

Rotozoom craze What’s Cool? V

Optimizations in ZX0 compression

That’s a long end marker!

An unexpected save

What the heckmeck?

Blog

Categories