Geschenkalarm
The Vintage Computing Christmas Challenge (VC³) is about recreating a text image in code, on a platform of your choice and with as little code as possible. This year’s reference image was a gift box:
My Amiga entry for VC³ ended up with a code size of 64 bytes (code below). That was small enough for a 33rd place (I think) out of 86 participants in the low-level/assembly category.
Edit: There isn’t really a dedicated assembly category (yet?). Together with Python, APL, and BASIC entries, it’s a shared 38th place!
As usual, near the top the entries get mind-boggingly small and clever – be sure to check out all the entries on the demozoo page once they’re uploaded, or watch the results presentation on YouTube.
Initial idea
This time around, I wasn’t in the mood to wrestle with the ancient AmigaOS BCPL layer again – 16 bytes of code to print a single letter? No, thanks!
So, no text output in the console this year. Then I had an idea: Why not use an alert? You know, like the infamous guru meditation (Amiga’s blue screen of death, if you will).
That would be kind of cool and a very Amiga thing to bring to the competition. It might even be well-suited for this specific image!
How to alert
There are two methods to display error messages in a blinking red box:
- Alert (exec.library)
Reboots the system and displays a guru meditation with an error code provided by you; not very useful. - DisplayAlert (intuition.library)
Displays a custom message where you can position substrings with pixel precision and even write characters on top of each other (allowing for bold text and ASCII art effects). This is what we want!
DisplayAlert takes an alert number (can be used to trigger a reset), the total height and the message itself. The message data is in a custom format:
00 26 0f 'Software Failure.' 00 01 00 ea 0f 'Press left ...' 00 00 | | | | | | | +-- more substrings to come? | | +----- string terminator | +---------------------------- y (byte) +---------------------------------- x (word)
Plain and simple
The naïve implementation for the gift box would be:
move.l 368(a2),a6 ; intuition base from global vector lea message(pc),a0 move.l #184,d1 jmp -90(a6) ; DisplayAlert message: dc.b 0,234, 18,' \O/',0,1 dc.b 0,234, 26,'+--------+--------+',0,1 dc.b 0,234, 34,'! ! !',0,1 dc.b 0,234, 42,'! ! !',0,1 dc.b 0,234, 50,'! ! !',0,1 dc.b 0,234, 58,'! ! !',0,1 dc.b 0,234, 66,'! ! !',0,1 dc.b 0,234, 74,'! ! !',0,1 dc.b 0,234, 82,'! ! !',0,1 dc.b 0,234, 90,'! ! !',0,1 dc.b 0,234, 98,'+--------+--------+',0,1 dc.b 0,234,106,'! ! !',0,1 dc.b 0,234,114,'! ! !',0,1 dc.b 0,234,122,'! ! !',0,1 dc.b 0,234,130,'! ! !',0,1 dc.b 0,234,138,'! ! !',0,1 dc.b 0,234,146,'! ! !',0,1 dc.b 0,234,154,'! ! !',0,1 dc.b 0,234,162,'! ! !',0,1 dc.b 0,234,170,'+--------+--------+',0,0
Of course this is quite verbose and needs a whole 488 bytes of code and data. But the code to display everything is quite short already. It’s a start!
Having a working version is always great to get in the game. It’s so much easier to shave bytes down from there, rather than come up with clever optimizations in your head before the first version. In my head, at least! :)
Funnily enough, I already encountered a documentation bug at this point. DisplayAlert’s synopsis suggests that “height” is a word:
Response = DisplayAlert( AlertNumber, String, Height ) D0 D0 A0 D1 BOOL DisplayAlert( ULONG, UBYTE *, WORD );
But the value in d1 is passed on as a longword! Under Kickstart 1.x with fast memory, this register usually contains a 32-bit memory address when your program starts. If you set the height as a 16-bit word, the upper 16 bits remain and DisplayAlert will try to create a gigantic bitmap for the message box.
.main ... ; d1 = 00c0 546c move.w #184,d1 ; d1 = 00c0 00b8 jmp -90(a6) ; DisplayAlert with a height of 12,583,096
When that fails, it will return instantly and display nothing at all. Maybe it will also chuckle at you being all bamboozled, scratching your head in confusion.
Single characters
The main optimization idea: Use the fact that we are
working with x/y coordinates, and that we can output one
letter at a time. So instead of writing the gift box
line by line, using counters to decide when to write +--+
and when to write ! !
, we can
juggle with coordinates and use the symmetry of the image,
switching between +
, !
and +
along the way.
In order to to fail more efficiently when trying to be clever, I put together a quick’n’dirty JavaScript testbed. This way, I got faster turnaround times and had a debugging log for free. With a simple DisplayAlert emulation and lots of reloads, I came up with this scheme:
- Draw two lines at once
- Plot
-
at x,y - Plot
!
at y,x (x and y reversed) - Increment x by 1 column
- Repeat 18 times
- Plot
- Do that for three y values
- Draw lines at x,y
- Draw lines at x,y + 9 rows
- Draw lines at x,y + 18 rows
- Have the line-draw code switch to
+
every 9 positions - Include that pesky
\O/
bow as-is
Porting the JavaScript “pseudo code” to assembly language worked surprisingly well, and I was quickly down to 88 bytes.
In the course of a week I managed to trim down the code size bit by bit (or rather, word by word), but the basic approach stayed the same.
Commit | Size in bytes | |
---|---|---|
inlined JS proto | 88 | |
moveq, subq instead of .w | 82 | |
d1 ends up with height | 78 | |
move.w d0, lose addq #1,a0 | 76 | |
bgt, beq for "+" decision | 74 | |
count y down | 72 | |
abandon horiz centering | 70 | |
fill buffer backwards, count y up again | 68 | |
replace cmp.w, no gain | 68 | |
cooler with movem.w, but same size | 68 | |
dirty typo hack | 66 | |
different when-to-plus calculation | 66 | |
magic constant: use sum for "+" and line end | 64 |
Submitted version
; registers: d0.l = 1, d4.l = 1, a2 = globvec lea .bufend(pc),a0 ; fill backwards moveq #22,d2 ; y .2lines moveq #22,d1 ; x moveq #-36,d6 ; counter when to use "+" .19chrs moveq #'-',d7 ; start with "-" .chr movem.w d0/d1/d7,-(a0) ; |00 01|xx xx|yy|ch| <-- (a0) add.b d2,4(a0) ; |___d0|___d1|d2|d7| moveq #'!',d7 ; switch to "!" exg d1,d2 ; swap x/y add.w #3558,d6 ; use "+"? bcc.b .noplus ; overflows after 1, 19, 37 times moveq #'+',d7 .noplus neg.l d4 ; do .chr twice blt.b .chr addq.w #8,d1 ; next x tst.b d6 ; (-36+38*3558).b = 0 bne.b .19chrs add.b #8*9,d2 ; next y bge.b .2lines ; (22+8*9*2).b < 0 move.l 368(a2),a6 ; intuition base subq.w #6,a0 ; a0 = \O/, d1 = 174 jmp -90(a6) ; DisplayAlert dc.w 86 dc.b 14,'\O/' .bufend = *+6*2*19*3
The resulting executable file looks like this:
- OS overhead
Amiga executable hunk structure (36 bytes), with a hunk size of
0x1500
longwords to reserve extra space for the alert message buffer- Code
68000 machine code and data (64 bytes)
00: 0000 03f3 0000 0000 0000 0001 0000 0000 ................ 10: 0000 0000 0000 1500 0000 03e9 0000 0010 ................ 20: 41fa 02ea 7416 7216 7cdc 7e2d 48a0 c100 A...t.r.|.~-H... 30: d528 0004 7e21 c342 dc7c 0de6 6402 7e2b .(..~!.B.|..d.~+ 40: 4484 6de8 5041 4a06 66e0 d43c 0048 6cd6 D.m.PAJ.f..<.Hl. 50: 2c6a 0170 5d48 4eee ffa6 0056 0e5c 4f2f ,j.p]HN....V.\O/ 60: 0000 03f2 ....
And it looks like this:
Optimizations
I’ll keep this short, as there’s some discussion of 68000 assembly sizecoding tricks in the wrchmas pattern write-up already. But there’s one very dirty trick exclusive to this Chistmas present!
Edit: I noticed after the write-up that there’s an accidental optimization which is also very dirty, see: All memory is zero, right?
Known register contents
d0 and d4 both contain 1 as a longword, and we use the BCPL global vector in a2 to obtain the intuition.library base for the DisplayAlert call.
Repeat with negativity
Idea from last year: Execute a code block twice without destroying a register.
.chr ... ; d4 is initially 00000001 neg.l d4 ; 4484 blt.b .chr ; 6de8
Work backwards
Initially I was filling the buffer after the static \O/
and then
passed the “real” start of the alert message to DisplayAlert.
But
filling the buffer backwards only requires a short subq instruction
to get there.
Start after \O/ | End at \O/ |
---|---|
lea .buf(pc),a0 ; 41fa 0040 ; fill with (a0)+ ; ; lea .alert(pc),a0 ; 41fa 0002 jmp -90(a6) ; .alert dc.b 0,86,14,'\O/' .buf |
lea .end(pc),a0 ; 41fa 02ea ; fill with -(a0) ; ; subq.w #6,a0 ; 5d48 jmp -90(a6) ; .alert dc.b 0,86,14,'\O/' .buf .end = .buf+6*2*19*3 |
= 8 bytes | = 6 bytes |
Y-loop setup
We want to run the two-lines-at-once block three times, with y values of 22, 94, 166. The order doesn’t matter, but if we start with 22, we can set the initial value with a shorter moveq instruction.
166 down to 22 | 22 up to 166 |
---|---|
move.w #166,d2 ; 343c 00a6 .2lines ... sub.w #8*9,d2 ; 947c 0048 bge.b .2lines ; 6cd6 |
moveq #22,d2 ; 7416 .2lines ... add.b #8*9,d2 ; d43c 0048 bge.b .2lines ; 6cd6 |
= 10 bytes | = 8 bytes |
When I tried the add.b version I was happy that it worked, but then I was dumbfounded why it worked. Shouldn’t we get a negative byte result in the second execution already? Well, yes, but apparently I had a wrong idea of how the the greater-or-equal check works. I thought it tests if the result is >= zero after the addition.
moveq #22,d2 addq.b #8*9,d2 ; 22+72 = 94 ; Okay, greater-equal is true addq.b #8*9,d2 ; 94+72 = 166 unsigned = -90 signed ; Why is greater-equal still true? addq.b #8*9,d2 ; -90+72 = -18 ; Greater-equal is defo false now?
On the metal, the processor uses the negative (N) and overflow (V) flags for the “bge” test (see table 3-19 in the 68000’s reference manual):
(N and V) or (!N and !V)
During the execution of the y loop, this gives:
When | Operation | Result as signed byte | N | V | bge |
---|---|---|---|---|---|
Initial | 22 = 0x16 |
||||
After 1st run | 22 + 72 |
94 = 0x5e |
0 | 0 | true |
After 2nd run | 94 + 72 |
-90 = 0xa6 |
1 | 1 | true |
After 3rd run | -90 + 72 |
-18 = 0xee |
1 | 0 | false |
That means the code comment in the submitted version isn’t quite correct:
add.b #8*9,d2 ; next y bge.b .2lines ; (22+8*9*2).b < 0
And I need a new mental model of the “greater or equal” semantics, but I think I’ll need a diagram for that and incorporate the other cases and conditionals while I’m at it. Oh, and get more familiar with the 68000: When stepping through this in EASy68K I already learned that there are, in fact, two ways to encode a byte-wise register addition:
add.b #72,d2 ; d43c 0048 addi.b #72,d2 ; 0602 0048
Main thing is: It’s working as intended, even if it’s by accident. :)
Sadly, no centering
Even with the x and y coordinates mirrored diagonally, I was able to keep the gift box more-or-less centered by writing the x coordinate as a byte and placing a 1 in front of it. This would effectively move the whole image 256 pixels to the right.
; write [00 01 01 xx yy ch] backwards ; | | | | ; | | +--+--- x + 256 ; +---+-------- string end and continuation byte ; of the previous substring move.b d7,-(a0) ; '+' or '-' or '+' move.b d2,-(a0) ; y move.b d1,-(a0) ; x low-byte (0-255) move.b d0,-(a0) ; x high-byte (256) (d0=00000001) move.w d0,-(a0) ; string terminator and continuation byte
Eventually the extra move.b had to be sacrified, saving 2 bytes.
move.b d1,-(a0); 1101move.b d0,-(a0); 1100 move.w d1,-(a0) ; 3101
Not a huge loss, aesthetically, since that fake centering looked weird anyway:
movem non-optimization
The movem.w instruction looks cool, but isn’t any shorter than the verbose version. Well, using less lines of code is a size optimization, too…
Verbose | movem.w |
---|---|
move.b d7,-(a0) ; 1107 move.b d2,-(a0) ; 1102 move.w d1,-(a0) ; 3101 move.w d0,-(a0) ; 3100 |
movem.w d0/d1/d7,-(a0) ; 48a0 c100 add.b d2,4(a0) ; d528 0004 |
= 8 bytes | = 8 bytes |
Come to think of it, an eor.b would have looked even cooler…
A happy accident
As luck would have it, after plotting everything the x value is 174 – a perfect value for the height of the alert box!
Helper aliases
From the “Why didn’t I think of that before?” department: vasm supports register aliases! Not a code optimization per se, but really helpful if you shuffle around the registers a lot: reordering them for a correct movem.w instruction, or making sure that nice accidental value ends up in d1.
a_data equr a0 d_one equr d0 d_x equr d1 d_y equr d2 d_pos equr d4 d_cnt equr d6 d_ch equr d7 moveq #22,d_x moveq #'-',d_ch add.b d_y,4(a_data) ...
Magic constants
I’m using d6 for two things at once: decide when to output a plus sign within a line, and to check if the line is complete. After a lot of fiddling (button “counttest3” in the sandbox) I ended up with an initial value of -36 and an increment value of 3558 for that.
Code size for the loop and the checks is 12 bytes:
moveq #-36,d6 ; 7cdc .19chrs (set "-" as current char) .chr (output current char) ... add.w #3558,d6 ; dc7c 0de6 bcc.b .noplus ; 6402 ... (set char = "+" or "!", swap x/y, repeat .chr) ... tst.b d6 ; 4a06 bne.b .19chrs ; 66e0
The abort condition of the outer loop works because d6 is
0x1000
after 2×19 iterations – the byte value is zero
and the bne branch (not equal zero) is not taken.
The bcc check of the inner .chr loop is designed to be true every 19th iteration by overflowing, starting with the first one. In that case, a plus is written next.
|-.19chrs--------|-.chr x,y---------------|-.chr y,x---------------| | d6 | set d7 | write | d6 | set d7 | write | d6 | set d7 | |------|---------|------------------------|------------------------| | ffdc | - | - | 0dc2 | + | + | 1ba8 | ! | | 1ba8 | - | - | 298e | ! | ! | 3774 | ! | | 3774 | - | - | 455a | ! | ! | 5340 | ! | | 5340 | - | - | 6126 | ! | ! | 6f0c | ! | | 6f0c | - | - | 7cf2 | ! | ! | 8ad8 | ! | | 8ad8 | - | - | 98be | ! | ! | a6a4 | ! | | a6a4 | - | - | b48a | ! | ! | c270 | ! | | c270 | - | - | d056 | ! | ! | de3c | ! | | de3c | - | - | ec22 | ! | ! | fa08 | ! | | fa08 | - | - | 07ee | + | + | 15d4 | ! | | 15d4 | - | - | 23ba | ! | ! | 31a0 | ! | | 31a0 | - | - | 3f86 | ! | ! | 4d6c | ! | | 4d6c | - | - | 5b52 | ! | ! | 6938 | ! | | 6938 | - | - | 771e | ! | ! | 8504 | ! | | 8504 | - | - | 92ea | ! | ! | a0d0 | ! | | a0d0 | - | - | aeb6 | ! | ! | bc9c | ! | | bc9c | - | - | ca82 | ! | ! | d868 | ! | | d868 | - | - | e64e | ! | ! | f434 | ! | | f434 | - | - | 021a | + | + | 1000 | ! | |------|-----------------------------------------------------------| | 1000 | line is complete -- add 8*9 to y, repeat | |------|-----------------------------------------------------------|
The net gain of all this hassle compared to a straightforward counter and a dbf? Two whopping bytes!
If you look closely, you’ll notice that the first .chr
(drawing the horizontal line) always outputs
-
, never +
. Don’t we need +--------+--------+
for the horizontal lines? That’s the final trick!
Dirty typo hack
If we ignore the +
-switching, during a combined
horizontal and vertical line loop all
these characters would be written:
-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!
But we do need plus signs in some places.
++-!-!-!-!-!-!-!-!++-!-!-!-!-!-!-!-!++
That would be the clean solution, but is ugly to implement. It is easier to test for “every nth time” instead of “first and second time in a period of n”. In the early versions I did implement it in the complicated way, using two consecutive conditional branches to cover two cases:
subq.w #1,d6 bgt.b .noplus ; -1 = draw +, reset counter beq.b .norst ; 0 = draw +, keep counter moveq #17,d6 ; reset period counter .norst moveq #'+',d7 .noplus
The final version, however, uses a simpler “every nth time” check, as seen above:
-+-!-!-!-!-!-!-!-!-+-!-!-!-!-!-!-!-!-+
So we only ever switch a !
for a +
, effectively
filling all the horizontal lines with -
and only overwriting them in
the vertical lines with +
(where the lines meet).
Because DisplayAlert draws all strings on top of each other,
overwriting characters can go wrong. In fact, if we were to put a !
on a
+
, we would get pixel mush:
However, we can overwrite a -
with a +
! Lucky for us,
the minus sign is fully contained in the plus sign in all variations of the system font!
Is this trick really dirty? Maybe, but platforms with a character buffer for the screen don’t even have this problem, so I don’t feel bad at all. :)
All memory is zero, right?
After the submission and the official results I noticed an unintended “trick”. I had assumed that the memory of the message buffer will be all zero, so DisplayAlert would see a string terminator and a null continuation byte at the end. Under Kickstart 1.3, this only worked by accident when I tested it in the emulator and on real hardware.
After a reboot, or if you have loaded other programs before, the buffer area might contain random garbage. DisplayAlert will happily interpret whatever is in memory after the final plus sign as additional sub-messages:
While the executable doesn’t overwrite unallocated memory (good), the memory that is allocated in the file header isn’t cleared under older Kickstarts (bad). This can be fixed in the file header (good) by explicitly requesting our memory with the MEMF_CLEAR flag set, but doing so will only work under Kickstart 2.0 and above (bad). At least Kickstarts 2.0 and 3.1 will zero out the program’s extra memory anway, according to my tests (good).
Good, bad, good, bad – what are the options?
- Change the requirements for the executable
- Either demand that it is run directly after power-up, or
- require at least Kickstart 2.0.
- Invest 2 bytes to make the code more robust
- Insert a
clr.w (a0)
- Costly, but compatible across all Kickstarts!
- Insert a
- Take a magnifying glass and re-read the rules
- Ha-haa! This is struck through:
Your program must not draw anything else on screen than the object itself. - But this still holds: The final screen may not include any additional characters (which are not from the operating system itself).
- Tough luck… For a moment I thought the extra garbage could be interpreted forgivingly, as in TFM’s CPC entry:
But those extra icons definitely come from the operating system.
- Ha-haa! This is struck through:
- Act as if this was intentional from the get-go
- “In sizecoding territory, dirty tricks and evil assumptions are allowed!”
- Kind of lame…
Of course, the second option – make the code robust and stay compatible – is the nicest one! I’ll look into that for a possible after-deadline submission. Ideally after having saved two extra bytes somewhere else…
For now, the entry works just fine with newer Kickstarts, but extra caution must be taken under Kickstart 1.3 (and 1.2). Do not use in production or to operate self-driving cars! :)
Discarded ideas
I am pretty confident I have missed a nice optimization or two. Maybe even one of these that I haven’t thought about enough?
Character manipulation
I’ve looked into shortcuts for the different characters
a lot. Can I turn a minus into a plus easily? Or turn
+-
into !
in one go?
[0123456789abcdef] 0x20 [ !"#$%&'()*+,-./] 0x30 [0123456789:;<=>?] 0x40 [@ABCDEFGHIJKLMNO] 0x50 [PQRSTUVWXYZ[\]^_] [ ] -- 32 -- 0x20 -- %00100000 [!] -- 33 -- 0x21 -- %00100001 [+] -- 43 -- 0x2b -- %00101011 [-] -- 45 -- 0x2d -- %00101101 [!] -1 = [ ] (space) [-] -1 = [,] [-] -12 = [!] [+] -12 = [□] (replacement character) [-] -2 = [+] [!] -2 = [□] (replacement character) [-] eor 6 = [+] [!] eor 6 = ['] [-] eor 12 = [!]
Guess not…?
Rolling bits
Having a shifting bit pattern to do the “write a plus” decision and also the end-of-loop test would have been nice. But the pattern was to wide for that.
move.l #%01000000001000000001000000000000,d0 .loop add.l d0,d0 ; 1-bit left shift ; negative --> output a '+' ; zero --> line is complete
Couldn’t incorporate that in a space-saving, practical way. That move.l alone eats up 6 bytes!
Self-modifying code
That would have been cool, by definiton. We do have a pointer relative to our own code in a3. And if we request chip memory in the file header, this could even be compatible with faster processors, as caching of code in chip memory is disabled (if I remember correctly).
But I didn’t encounter an obvious opportunity for code-patching.
Different continuation byte
In the DisplayAlert substring format, the continuation byte doesn’t need
to be a 1, it can be any byte other than zero. If we could somehow make
sure that the x and y registers contain a value like 00nn
in
the upper word, we could set the string terminator, the continuation byte and
the x value with one move.l!
; write [00 nn 00 xx yy ch] backwards ; | | ; +---------+-- one move.l for all of this move.b d7,-(a0) ; '+' or '-' or '+' move.b d2,-(a0) ; ymove.w d1,-(a0); xmove.w d0,-(a0); string terminator and continuation byte move.l d1,-(a0) ; string terminator, continuation, x
That would save 2 bytes, but how do we pre-fill the registers like that? Put x and y into one register and do a swap instead of an exg? But then we need code to initialize that, and DisplayAlert wouldn’t have a “clean” long value for the height anymore.
We could also bset d1 with itself each time we initialize the x value, always setting bit #22 to 1:
.2lines moveq #22,d1 ; d1 = 0000 0016 bset d1,d1 ; d1 = 0040 0016 now, matching 00nn xxxx
Then we could do the move.l write. But: The code size would be the same, and there still wouldn’t be a suitable value we can pass to DisplayAlert as height. Also, this would interfere with the d1 and d2 exchange happening during .chr, working only half of the time.
Wrap it up better?
That bow \O/
and its coordinates eat up six precious bytes.
dc.b 0,86,14,'\O/' ; 00 56 0e 5c 4f 2f
Outrageous, but cute.
Speaking of wrapping up…
So much for keeping it short. But thanks for reading! When writing this, I was lowkey hoping for another optimization epiphany. Such is the rabbit hole that is sizecoding! :)
Downloads
Demo Scene
- Geschenkalarm
- new art
- hrrngh!
- B.S.I. – Byte Scene Investigation
- Shall we play a game?
- Worms VBI
- strss
- rotz
- wchrmas pattern
- More…
Blog
- VC³ results are in!
- A capital sharp S for Topaz
- Synthwave Xmas
- Keming war gestern
- Worms DeCoded
- VC³ 2024, an early present
- Fun with Worms DC custom levels
- Modding the Windows boot hand
- Unit Arrr
- I just favican’t
- More…