heckmeck!

Nerd content and
cringe since 1999

Alexander Grupe
Losso/ATW

The ZX0 unpack routine got shorter again, and there’s savings to be had in my latest 512 byte intro. Best of all: I didn’t do anything myself! :)

This is a follow-up to Optimizations in ZX0 compression.

unzx0_68000.S now 4 bytes shorter

In the last ZX0 post, I wrote about saving two bytes in the ZX0 decompression routine for 68000 processors. We were able to replace the copy-literals loop with a slightly shorter version:

              assembler code       binary output
              ------------------   -------------

-             subq.l #1,d0       ; 5380
- .copy_lits: move.b (a0)+,(a1)+ ; 12d8
-             dbf d0,.copy_lits  ; 51c8 fffc

+ .copy_lits: move.b (a0)+,(a1)+ ; 12d8
+             subq.w #1,d0       ; 5340
+             bgt.b  .copy_lits  ; 6efa

platon42/Desire spotted another optimization, saving two more bytes:

- .do_copy_offs: move.l a1,a2           ; 2449
-                add.l d2,a2            ; d5c2
- .copy_match:   move.b (a2)+,(a1)+     ; 12da
-                dbf d0,.copy_match     ; 51c8 fffc

+ .do_copy_offs:
+ .copy_match:   move.b (a1,d2.l),(a1)+ ; 12f1 6800
+                dbf d0,.copy_match     ; 51c8 fffa

Awesome. Great news for tiny productions! :)

Brute-forcing your way to insanity

This is not about the ZX0 decompressor per se (i. e. a “hard win” that you will always get when using ZX0), but rather some specific byte-mangling to make the compressed data for Temba, seine Arme weit! even smaller.

As mentioned in the write-up, I resorted to brute-forcing certain byte values and execution orders during the last hours before the deadline. This is time-consuming and the results are not always obvious, but I managed to squeeze out some bytes I could use to improve that 512 byte intro (restore the mouse pointer, exit to AmigaDOS cleanly).

phoyd who I told about this stuff at Nordlicht suprisingly wasn’t bored to death (not completely at least), but instead undusted his 68000 coding skills after the party and found some more optimizations. I love how seemingly random these are!

Party release: baseline
-----------------------
Executable header ............ 36
ZX0 unpacker ................. 88
ZX0 payload ................. 388

--> 512 bytes in total


Optimization 1: target address
------------------------------
-     move.l  .opc(pc),a4  ; in the unpacker
+     move.l  #$6664e,a4

- UNPACK_TARGET   = $5e14e ; in the intro (source)
- SINTAB          = $58000
+ UNPACK_TARGET   = $6664e
+ SINTAB          = $64000

Executable header ............ 36
ZX0 unpacker ................. 90
ZX0 payload ................. 384

--> 2 bytes saved despite larger unpacker code!


Optimization 2: volume
----------------------
-     move.l  #$00400000,$dff0a8 ; set AUD0VOL+DAT, vol=64 ($40)
+     move.l  #$00440000,$dff0a8 ; only bit 6 needed for max volume,
+                                ; other bits freely assignable

Executable header ............ 36
ZX0 unpacker ................. 90
ZX0 payload ................. 383

--> Another byte saved!

Together with platon’s decompressor enhancement, the intro now fits into 508 bytes! One of which is a padding byte, so there is room for five more compressed bytes.

The drawback: These hyperoptimizations are really brittle. I tried adding some color flashing during the pre-calc time, like this:

+     move.w  @v,$dff180 ; 33c5 00df f180 (flash background)

But wherever I try to insert this, whichever register I use, the compressed size suddenly grows by 9 bytes or more, exceeding the 512-byte limit. I guess I would need to brute-force a way to use the extra space that we gained by brute-forcing…

previous next close