As mentioned over at Bluesky, I’m currently on sick leave and taking the time to make myself more familiar with ZX0 compression – also in preparation of another 512 byte entry for Nordlicht, maybe. :)
Thankfully, ZX0 author Einar Saukas has provided reference implementations in several programming languages (C, Kotlin, Java), and with that, I was able to quickly peek what’s happening behind the scenes, gather diagnostic logs, etc.
Summary: I saved two bits for my use case, and two bytes for everyone!
Edit: The 68000 decompression routine referenced below is a port by Emmanuel Marty, not a reference implementation by Einar Saukas. Thanks to both of you for your work!
That’s a long end marker!
One thing that has bothered me was that my compressed data always
ended with the bytes 0x5556
or 0xAAAB
. Can’t this be shorter?
This has to do with how ZX0 marks the end of the data stream.
- ZX0 has three modes:
- Output the next n bytes literally (uncompressed)
- Copy n bytes from last position
- Copy n bytes from a specific position
- ZX0 knows two flavors of bytes, so to speak:
- Whole data bytes (uncompressed, direct data)
- Bit-packed bytes (control bits and specially-encoded values)
- Both are interspersed in the compressed data
- ZX0 will try to cram as many bits as possible into each non-data byte; you can think of all non-data bytes together as a single stream of bits
- Byte lenghts and offsets use a Elias gamma coding
scheme.
This way, common short values use fewer bits in the control data stream. For example:
- 1 =
1
- 2 =
011
- 3 =
001
- 4 =
01011
- 5 =
01001
- 6 =
00011
- 7 =
00001
- 8 =
0101011
- etc.
- 1 =
- The “copy n specific bytes” mode uses two values to calculate the start offset:
- An Elias-coded value (variable bit length)
- An unchanged, literal byte taken from the data (8 bits)
- Combined into
eliasValue << 8 | byteValue
- When the “copy n specific bytes” mode encounters an Elias value of 256, this marks the end of the compressed stream. (The compressor outputs that value when it’s done.)
- The value 256 is where the
0x5556
and0xAAAB
values come from.- 256 encodes to
01010101010101011
- That’s
0xAAAB
in hexadecimal - Or
0x15556
when shifted one bit to the left
- 256 encodes to
Curious, I used the decompression code to trace what other Elias values are used in my data for the “copy from” position, in addition to the end marker. I only saw 5 distinct values being used:
Elias value | Decoded | Meaning |
---|---|---|
1 | 1 | Copy n bytes from position 0x01.. |
011 | 2 | Copy n bytes from position 0x02.. |
001 | 3 | Copy n bytes from position 0x03.. |
01011 | 4 | Copy n bytes from position 0x04.. |
01010101010101011 | 256 | End marker |
So if my data only uses start positions up to 0x04..
– could I just use a
smaller value
for the end marker? Maybe 7, as it uses the same number of Elias bits as 4 and would leave some leeway.
01010101010101011
= old end marker, 25600001
= new end marker, 7- 12 bits saved!
But I couldn’t just replace the new end-marker value in the 68000 decompression code.
It doesn’t even use the value 256 directly, but rather
checks if the byte portion
ends up as 0x00
(the beq
branch is taken, then):
.get_offset: moveq #-2,d0 ; initialize value to $fe bsr.s .elias_loop ; read high byte of match offset addq.b #1,d0 ; obtain negative offset high byte beq.s .done ; exit if EOD marker
Bummer! I did, however, change the
beq
to a bge
check (greater or equal than 0, signed). For
this check to succeed, we only need an Elias value of 129 instead of 256:
End marker | Effect of addq.b #1 | End detection |
---|---|---|
256 | 0xFFFFFEFF becomes 0xFFFFFE00 | beq is true |
129 | 0xFFFFFF7E becomes 0xFFFFFF7F | bge is true |
This limits the copy-from offsets to the 0x80..
range (acceptable for my data), but sadly,
129 is only 2 bits shorter when Elias-encoded.
01010101010101011
= 256, using 17 bits010101010101001
= 129, using 15 bits
When the stars align right, we might chop off the last byte of our compressed data – in case it was only used for the lower Elias bits of 256 before. Not a huge win for all that hassle…
Needs more research: Maybe there’s another way of encoding the “end of stream” condition in 68000-assembly-friendly form. Especially when it can be be tailored and hard-coded to my specific data. For now, I’m happy with the 2 bits saved! :)
An unexpected save
The bigger surprise came when I was looking at the 68000 decompression code some more.
In the loop to copy literal bytes, it does this:
.literals: bsr.s .get_elias ; read number of literals to copy subq.l #1,d0 ; dbf will loop until d0 is -1, not 0 .copy_lits: move.b (a0)+,(a1)+ ; copy literal byte dbf d0,.copy_lits ; loop for all literal byte
This gets assembled to:
.literals: bsr.s .get_elias ; 6136 subq.l #1,d0 ; 5380 .copy_lits: move.b (a0)+,(a1)+ ; 12d8 dbf d0,.copy_lits ; 51c8 fffc
That dbf
instruction is huge, requiring an
extra word to encode the branching target (here: 0xFFFC
= -4 signed).
Now, for short loops a dbf
instruction can be replaced with a subq
and a conditional branch,
like this:
dbf code | subq version |
---|---|
; ; d0 = # of iterations ; subq.l #1,d0 .loop ; ; do stuff ; dbf d0,.loop ; 51c8 fffc ; ; d0 is -1 now ; | ; ; d0 = # of iterations ; subq.l #1,d0 .loop ; ; do stuff ; subq.w #1,d0 ; 5340 bge.b .loop ; 6cfa ; ; d0 is -1 now ; |
Of course, that’s still the same size – both versions take up 4 bytes.
As noted in the decompressor code comment, a dbf
will always loop until the
counter register becomes -1. If we want 23 loop executions, we need to put 23-1 = 22 into register d0.
(Hence the subq #1
before the loop.)
With subq
and a branch, however, we can
change the branch condition to end at zero. We’ll be getting 23 executions when d0 is 23,
and don’t need to subtract 1 before the loop!
Only difference: After the loop, d0 will contain 0, not -1.
Is it important if d0 ends up as 0 or -1? Let’s check what happens after the loop!
.literals: bsr.s .get_elias ; read number of literals to copy subq.l #1,d0 ; dbf will loop until d0 is -1, not 0 .copy_lits: move.b (a0)+,(a1)+ ; copy literal byte dbf d0,.copy_lits ; loop for all literal bytes add.b d1,d1 ; read 'match or rep-match' bit bcs.s .get_offset ; if 1: read offset, if 0: rep-match
The code either continues or jumps elsewhere. This is what happens in each case:
“Match or rep-match” = 0 | “Match or rep-match” = 1 |
---|---|
add.b d1,d1 bcs.s .get_offset ; ; branch not taken ; .rep_match: bsr.s .get_elias ... .get_elias: moveq #1,d0 ... | add.b d1,d1 bcs.s .get_offset ; ; branch taken ; .get_offset: moveq #-2,d0 ... |
Turns out d0 will always be overwritten! Thus, we can rewrite the loop like so:
subq.l #1,d0;5380.copy_lits: move.b (a0)+,(a1)+ ; 12d8dbf d0,.copy_lits;51c8fffcsubq.w #1,d0 ; 5340 bgt.s .copy_lits ; 6efa
Cool! We just removed unnecessary code from the decompression routine that has been sitting there the whole time! Everyone using unzx0_68000.S can save two whopping bytes now.
That can make quite the difference in a 512 byte intro! :)
I need to stop here for now, as I’m getting more excited than my doctor would recommend. Time for a nap, and let’s hope I didn’t overlook anything here!