The ZX0 unpack routine got shorter again, and there’s savings to be had in my latest 512 byte intro. Best of all: I didn’t do anything myself! :)
This is a follow-up to Optimizations in ZX0 compression.
unzx0_68000.S now 4 bytes shorter
In the last ZX0 post, I wrote about saving two bytes in the ZX0 decompression routine for 68000 processors. We were able to replace the copy-literals loop with a slightly shorter version:
assembler code binary output ------------------ ------------- - subq.l #1,d0 ; 5380 - .copy_lits: move.b (a0)+,(a1)+ ; 12d8 - dbf d0,.copy_lits ; 51c8 fffc + .copy_lits: move.b (a0)+,(a1)+ ; 12d8 + subq.w #1,d0 ; 5340 + bgt.b .copy_lits ; 6efa
platon42/Desire spotted another optimization, saving two more bytes:
- .do_copy_offs: move.l a1,a2 ; 2449 - add.l d2,a2 ; d5c2 - .copy_match: move.b (a2)+,(a1)+ ; 12da - dbf d0,.copy_match ; 51c8 fffc + .do_copy_offs: + .copy_match: move.b (a1,d2.l),(a1)+ ; 12f1 6800 + dbf d0,.copy_match ; 51c8 fffa
Awesome. Great news for tiny productions! :)
Brute-forcing your way to insanity
This is not about the ZX0 decompressor per se (i. e. a “hard win” that you will always get when using ZX0), but rather some specific byte-mangling to make the compressed data for Temba, seine Arme weit! even smaller.
As mentioned in the write-up, I resorted to brute-forcing certain byte values and execution orders during the last hours before the deadline. This is time-consuming and the results are not always obvious, but I managed to squeeze out some bytes I could use to improve that 512 byte intro (restore the mouse pointer, exit to AmigaDOS cleanly).
phoyd who I told about this stuff at Nordlicht suprisingly wasn’t bored to death (not completely at least), but instead undusted his 68000 coding skills after the party and found some more optimizations. I love how seemingly random these are!
Party release: baseline ----------------------- Executable header ............ 36 ZX0 unpacker ................. 88 ZX0 payload ................. 388 --> 512 bytes in total Optimization 1: target address ------------------------------ - move.l .opc(pc),a4 ; in the unpacker + move.l #$6664e,a4 - UNPACK_TARGET = $5e14e ; in the intro (source) - SINTAB = $58000 + UNPACK_TARGET = $6664e + SINTAB = $64000 Executable header ............ 36 ZX0 unpacker ................. 90 ZX0 payload ................. 384 --> 2 bytes saved despite larger unpacker code! Optimization 2: volume ---------------------- - move.l #$00400000,$dff0a8 ; set AUD0VOL+DAT, vol=64 ($40) + move.l #$00440000,$dff0a8 ; only bit 6 needed for max volume, + ; other bits freely assignable Executable header ............ 36 ZX0 unpacker ................. 90 ZX0 payload ................. 383 --> Another byte saved!
Together with platon’s decompressor enhancement, the intro now fits into 508 bytes! One of which is a padding byte, so there is room for five more compressed bytes.
The drawback: These hyperoptimizations are really brittle. I tried adding some color flashing during the pre-calc time, like this:
+ move.w @v,$dff180 ; 33c5 00df f180 (flash background)

But wherever I try to insert this, whichever register I use, the compressed size suddenly grows by 9 bytes or more, exceeding the 512-byte limit. I guess I would need to brute-force a way to use the extra space that we gained by brute-forcing…