Geschenkalarm

29 Dec 2024
Demo Scene
Amiga Style

The Vintage Computing Christmas Challenge (VC³) is about recreating a text image in code, on a platform of your choice and with as little code as possible. This year’s reference image was a gift box:

My Amiga entry for VC³ ended up with a code size of 64 bytes (code below). That was small enough for a 33rd place (I think) out of 86 participants in the low-level/assembly category.

Edit: There isn’t really a dedicated assembly category (yet?). Together with Python, APL, and BASIC entries, it’s a shared 38th place!

As usual, near the top the entries get mind-boggingly small and clever – be sure to check out all the entries on the demozoo page once they’re uploaded, or watch the results presentation on YouTube.

Initial idea

This time around, I wasn’t in the mood to wrestle with the ancient AmigaOS BCPL layer again – 16 bytes of code to print a single letter? No, thanks!

So, no text output in the console this year. Then I had an idea: Why not use an alert? You know, like the infamous guru meditation (Amiga’s blue screen of death, if you will).

That would be kind of cool and a very Amiga thing to bring to the competition. It might even be well-suited for this specific image!

How to alert

There are two methods to display error messages in a blinking red box:

Alert (exec.library)
Reboots the system and displays a guru meditation with an error code provided by you; not very useful.
DisplayAlert (intuition.library)
Displays a custom message where you can position substrings with pixel precision and even write characters on top of each other (allowing for bold text and ASCII art effects). This is what we want!

DisplayAlert takes an alert number (can be used to trigger a reset), the total height and the message itself. The message data is in a custom format:

00 26 0f 'Software Failure.' 00 01
00 ea 0f 'Press left ...'    00 00
|     |                      |  |
|     |                      |  +-- more substrings to come?
|     |                      +----- string terminator
|     +---------------------------- y (byte)
+---------------------------------- x (word)

Plain and simple

The naïve implementation for the gift box would be:

        move.l  368(a2),a6      ; intuition base from global vector
        lea     message(pc),a0
        move.l  #184,d1
        jmp     -90(a6)         ; DisplayAlert
message:
        dc.b 0,234, 18,'        \O/',0,1
        dc.b 0,234, 26,'+--------+--------+',0,1
        dc.b 0,234, 34,'!        !        !',0,1
        dc.b 0,234, 42,'!        !        !',0,1
        dc.b 0,234, 50,'!        !        !',0,1
        dc.b 0,234, 58,'!        !        !',0,1
        dc.b 0,234, 66,'!        !        !',0,1
        dc.b 0,234, 74,'!        !        !',0,1
        dc.b 0,234, 82,'!        !        !',0,1
        dc.b 0,234, 90,'!        !        !',0,1
        dc.b 0,234, 98,'+--------+--------+',0,1
        dc.b 0,234,106,'!        !        !',0,1
        dc.b 0,234,114,'!        !        !',0,1
        dc.b 0,234,122,'!        !        !',0,1
        dc.b 0,234,130,'!        !        !',0,1
        dc.b 0,234,138,'!        !        !',0,1
        dc.b 0,234,146,'!        !        !',0,1
        dc.b 0,234,154,'!        !        !',0,1
        dc.b 0,234,162,'!        !        !',0,1
        dc.b 0,234,170,'+--------+--------+',0,0

Of course this is quite verbose and needs a whole 488 bytes of code and data. But the code to display everything is quite short already. It’s a start!

Having a working version is always great to get in the game. It’s so much easier to shave bytes down from there, rather than come up with clever optimizations in your head before the first version. In my head, at least! :)

Funnily enough, I already encountered a documentation bug at this point. DisplayAlert’s synopsis suggests that “height” is a word:

Response = DisplayAlert( AlertNumber, String, Height )
D0                       D0           A0      D1

BOOL DisplayAlert( ULONG, UBYTE *, WORD );

But the value in d1 is passed on as a longword! Under Kickstart 1.x with fast memory, this register usually contains a 32-bit memory address when your program starts. If you set the height as a 16-bit word, the upper 16 bits remain and DisplayAlert will try to create a gigantic bitmap for the message box.

.main   ...             ; d1 = 00c0 546c
        move.w  #184,d1 ; d1 = 00c0 00b8
        jmp     -90(a6) ; DisplayAlert with a height of 12,583,096

When that fails, it will return instantly and display nothing at all. Maybe it will also chuckle at you being all bamboozled, scratching your head in confusion.

Single characters

The main optimization idea: Use the fact that we are working with x/y coordinates, and that we can output one letter at a time. So instead of writing the gift box line by line, using counters to decide when to write +--+ and when to write ! !, we can juggle with coordinates and use the symmetry of the image, switching between +, ! and + along the way.

In order to to fail more efficiently when trying to be clever, I put together a quick’n’dirty JavaScript testbed. This way, I got faster turnaround times and had a debugging log for free. With a simple DisplayAlert emulation and lots of reloads, I came up with this scheme:

Draw two lines at once
- Plot - at x,y
- Plot ! at y,x (x and y reversed)
- Increment x by 1 column
- Repeat 18 times
Do that for three y values
- Draw lines at x,y
- Draw lines at x,y + 9 rows
- Draw lines at x,y + 18 rows
Have the line-draw code switch to + every 9 positions
Include that pesky \O/ bow as-is

Porting the JavaScript “pseudo code” to assembly language worked surprisingly well, and I was quickly down to 88 bytes.

In the course of a week I managed to trim down the code size bit by bit (or rather, word by word), but the basic approach stayed the same.

Commit	Size in bytes
inlined JS proto		88
moveq, subq instead of .w		82
d1 ends up with height		78
move.w d0, lose addq #1,a0		76
bgt, beq for "+" decision		74
count y down		72
abandon horiz centering		70
fill buffer backwards, count y up again		68
replace cmp.w, no gain		68
cooler with movem.w, but same size		68
dirty typo hack		66
different when-to-plus calculation		66
magic constant: use sum for "+" and line end		64

Submitted version

; registers: d0.l = 1, d4.l = 1, a2 = globvec

        lea     .bufend(pc),a0  ; fill backwards 
        moveq   #22,d2          ; y
.2lines moveq   #22,d1          ; x
        moveq   #-36,d6         ; counter when to use "+"
.19chrs moveq   #'-',d7         ; start with "-"
.chr    movem.w d0/d1/d7,-(a0)  ; |00 01|xx xx|yy|ch| <-- (a0) 
        add.b   d2,4(a0)        ; |___d0|___d1|d2|d7|
        moveq   #'!',d7         ; switch to "!"
        exg     d1,d2           ; swap x/y
        add.w   #3558,d6        ; use "+"?
        bcc.b   .noplus         ; overflows after 1, 19, 37 times
        moveq   #'+',d7
.noplus neg.l   d4              ; do .chr twice
        blt.b   .chr
        addq.w  #8,d1           ; next x
        tst.b   d6              ; (-36+38*3558).b = 0
        bne.b   .19chrs
        add.b   #8*9,d2         ; next y
        bge.b   .2lines         ; (22+8*9*2).b < 0
        move.l  368(a2),a6      ; intuition base
        subq.w  #6,a0           ; a0 = \O/, d1 = 174
        jmp     -90(a6)         ; DisplayAlert
        dc.w    86
        dc.b    14,'\O/'
.bufend = *+6*2*19*3

The resulting executable file looks like this:

OS overhead: Amiga executable hunk structure (36 bytes), with a hunk size of 0x1500 longwords to reserve extra space for the alert message buffer
Code: 68000 machine code and data (64 bytes)

00: 0000 03f3 0000 0000 0000 0001 0000 0000 ................
10: 0000 0000 0000 1500 0000 03e9 0000 0010 ................
20: 41fa 02ea 7416 7216 7cdc 7e2d 48a0 c100 A...t.r.|.~-H...
30: d528 0004 7e21 c342 dc7c 0de6 6402 7e2b .(..~!.B.|..d.~+
40: 4484 6de8 5041 4a06 66e0 d43c 0048 6cd6 D.m.PAJ.f..<.Hl.
50: 2c6a 0170 5d48 4eee ffa6 0056 0e5c 4f2f ,j.p]HN....V.\O/
60: 0000 03f2                               ....

And it looks like this:

Optimizations

I’ll keep this short, as there’s some discussion of 68000 assembly sizecoding tricks in the wrchmas pattern write-up already. But there’s one very dirty trick exclusive to this Chistmas present!

Edit: I noticed after the write-up that there’s an accidental optimization which is also very dirty, see: All memory is zero, right?

Known register contents

d0 and d4 both contain 1 as a longword, and we use the BCPL global vector in a2 to obtain the intuition.library base for the DisplayAlert call.

Repeat with negativity

Idea from last year: Execute a code block twice without destroying a register.

.chr    ...             ; d4 is initially 00000001
        neg.l   d4      ; 4484
        blt.b   .chr    ; 6de8

Work backwards

Initially I was filling the buffer after the static \O/ and then passed the “real” start of the alert message to DisplayAlert. But filling the buffer backwards only requires a short subq instruction to get there.

Start after \O/	End at \O/
lea .buf(pc),a0 ; 41fa 0040 ; fill with (a0)+ ; ; lea .alert(pc),a0 ; 41fa 0002 jmp -90(a6) ; .alert dc.b 0,86,14,'\O/' .buf	lea .end(pc),a0 ; 41fa 02ea ; fill with -(a0) ; ; subq.w #6,a0 ; 5d48 jmp -90(a6) ; .alert dc.b 0,86,14,'\O/' .buf .end = .buf+6219*3
= 8 bytes	= 6 bytes

Start after \O/

End at \O/

  lea .buf(pc),a0   ; 41fa 0040
                    ;
  fill with (a0)+   ;
                    ;
  lea .alert(pc),a0 ; 41fa 0002
  jmp -90(a6)       ;

.alert
  dc.b 0,86,14,'\O/'
.buf

  lea .end(pc),a0   ; 41fa 02ea
                    ;
  fill with -(a0)   ;
                    ;
  subq.w  #6,a0     ; 5d48
  jmp -90(a6)       ;

.alert
  dc.b 0,86,14,'\O/'
.buf
.end = .buf+6*2*19*3

= 8 bytes

= 6 bytes

Y-loop setup

We want to run the two-lines-at-once block three times, with y values of 22, 94, 166. The order doesn’t matter, but if we start with 22, we can set the initial value with a shorter moveq instruction.

166 down to 22	22 up to 166
move.w #166,d2 ; 343c 00a6 .2lines ... sub.w #8*9,d2 ; 947c 0048 bge.b .2lines ; 6cd6	moveq #22,d2 ; 7416 .2lines ... add.b #8*9,d2 ; d43c 0048 bge.b .2lines ; 6cd6
= 10 bytes	= 8 bytes

When I tried the add.b version I was happy that it worked, but then I was dumbfounded why it worked. Shouldn’t we get a negative byte result in the second execution already? Well, yes, but apparently I had a wrong idea of how the the greater-or-equal check works. I thought it tests if the result is >= zero after the addition.

  moveq   #22,d2

  addq.b  #8*9,d2 ; 22+72 = 94
                  ; Okay, greater-equal is true

  addq.b  #8*9,d2 ; 94+72 = 166 unsigned = -90 signed
                  ; Why is greater-equal still true?

  addq.b  #8*9,d2 ; -90+72 = -18
                  ; Greater-equal is defo false now?

On the metal, the processor uses the negative (N) and overflow (V) flags for the “bge” test (see table 3-19 in the 68000’s reference manual):

(N and V) or (!N and !V)

During the execution of the y loop, this gives:

When	Operation	Result as signed byte	N	V	bge
Initial		22 = 0x16
After 1st run	22 + 72	94 = 0x5e	0	0	true
After 2nd run	94 + 72	-90 = 0xa6	1	1	true
After 3rd run	-90 + 72	-18 = 0xee	1	0	false

That means the code comment in the submitted version isn’t quite correct:

        add.b   #8*9,d2         ; next y
        bge.b   .2lines         ; (22+8*9*2).b < 0

And I need a new mental model of the “greater or equal” semantics, but I think I’ll need a diagram for that and incorporate the other cases and conditionals while I’m at it. Oh, and get more familiar with the 68000: When stepping through this in EASy68K I already learned that there are, in fact, two ways to encode a byte-wise register addition:

        add.b   #72,d2  ; d43c 0048
        addi.b  #72,d2  ; 0602 0048

Main thing is: It’s working as intended, even if it’s by accident. :)

Sadly, no centering

Even with the x and y coordinates mirrored diagonally, I was able to keep the gift box more-or-less centered by writing the x coordinate as a byte and placing a 1 in front of it. This would effectively move the whole image 256 pixels to the right.

  ; write [00 01 01 xx yy ch] backwards
  ;        |   | |  |
  ;        |   | +--+--- x + 256
  ;        +---+-------- string end and continuation byte
  ;                      of the previous substring

  move.b d7,-(a0) ; '+' or '-' or '+'
  move.b d2,-(a0) ; y
  move.b d1,-(a0) ; x low-byte  (0-255)
  move.b d0,-(a0) ; x high-byte (256)   (d0=00000001)
  move.w d0,-(a0) ; string terminator and continuation byte

Eventually the extra move.b had to be sacrified, saving 2 bytes.

  move.b d1,-(a0) ; 1101
  move.b d0,-(a0) ; 1100
  move.w d1,-(a0) ; 3101

Not a huge loss, aesthetically, since that fake centering looked weird anyway:

movem non-optimization

The movem.w instruction looks cool, but isn’t any shorter than the verbose version. Well, using less lines of code is a size optimization, too…

Verbose	movem.w
move.b d7,-(a0) ; 1107 move.b d2,-(a0) ; 1102 move.w d1,-(a0) ; 3101 move.w d0,-(a0) ; 3100	movem.w d0/d1/d7,-(a0) ; 48a0 c100 add.b d2,4(a0) ; d528 0004
= 8 bytes	= 8 bytes

Come to think of it, an eor.b would have looked even cooler…

A happy accident

As luck would have it, after plotting everything the x value is 174 – a perfect value for the height of the alert box!

Helper aliases

From the “Why didn’t I think of that before?” department: vasm supports register aliases! Not a code optimization per se, but really helpful if you shuffle around the registers a lot: reordering them for a correct movem.w instruction, or making sure that nice accidental value ends up in d1.

a_data  equr    a0
d_one   equr    d0
d_x     equr    d1
d_y     equr    d2
d_pos   equr    d4
d_cnt   equr    d6
d_ch    equr    d7

        moveq   #22,d_x
        moveq   #'-',d_ch
        add.b   d_y,4(a_data)
        ...

Magic constants

I’m using d6 for two things at once: decide when to output a plus sign within a line, and to check if the line is complete. After a lot of fiddling (button “counttest3” in the sandbox) I ended up with an initial value of -36 and an increment value of 3558 for that.

Code size for the loop and the checks is 12 bytes:

        moveq   #-36,d6         ; 7cdc
.19chrs (set "-" as current char)
.chr    (output current char)
        ...
        add.w   #3558,d6        ; dc7c 0de6
        bcc.b   .noplus         ; 6402
        ...
        (set char = "+" or "!", swap x/y, repeat .chr)
        ...
        tst.b   d6              ; 4a06
        bne.b   .19chrs         ; 66e0

The abort condition of the outer loop works because d6 is 0x1000 after 2×19 iterations – the byte value is zero and the bne branch (not equal zero) is not taken.

The bcc check of the inner .chr loop is designed to be true every 19th iteration by overflowing, starting with the first one. In that case, a plus is written next.

|-.19chrs--------|-.chr x,y---------------|-.chr y,x---------------|
| d6   | set d7  | write | d6   | set d7  | write | d6   | set d7  |
|------|---------|------------------------|------------------------|
| ffdc |    -    |   -   | 0dc2 |    +    |   +   | 1ba8 |    !    |
| 1ba8 |    -    |   -   | 298e |    !    |   !   | 3774 |    !    |
| 3774 |    -    |   -   | 455a |    !    |   !   | 5340 |    !    |
| 5340 |    -    |   -   | 6126 |    !    |   !   | 6f0c |    !    |
| 6f0c |    -    |   -   | 7cf2 |    !    |   !   | 8ad8 |    !    |
| 8ad8 |    -    |   -   | 98be |    !    |   !   | a6a4 |    !    |
| a6a4 |    -    |   -   | b48a |    !    |   !   | c270 |    !    |
| c270 |    -    |   -   | d056 |    !    |   !   | de3c |    !    |
| de3c |    -    |   -   | ec22 |    !    |   !   | fa08 |    !    |
| fa08 |    -    |   -   | 07ee |    +    |   +   | 15d4 |    !    |
| 15d4 |    -    |   -   | 23ba |    !    |   !   | 31a0 |    !    |
| 31a0 |    -    |   -   | 3f86 |    !    |   !   | 4d6c |    !    |
| 4d6c |    -    |   -   | 5b52 |    !    |   !   | 6938 |    !    |
| 6938 |    -    |   -   | 771e |    !    |   !   | 8504 |    !    |
| 8504 |    -    |   -   | 92ea |    !    |   !   | a0d0 |    !    |
| a0d0 |    -    |   -   | aeb6 |    !    |   !   | bc9c |    !    |
| bc9c |    -    |   -   | ca82 |    !    |   !   | d868 |    !    |
| d868 |    -    |   -   | e64e |    !    |   !   | f434 |    !    |
| f434 |    -    |   -   | 021a |    +    |   +   | 1000 |    !    |
|------|-----------------------------------------------------------|
| 1000 | line is complete -- add 8*9 to y, repeat                  |
|------|-----------------------------------------------------------|

The net gain of all this hassle compared to a straightforward counter and a dbf? Two whopping bytes!

If you look closely, you’ll notice that the first .chr (drawing the horizontal line) always outputs -, never +. Don’t we need +--------+--------+ for the horizontal lines? That’s the final trick!

Dirty typo hack

If we ignore the +-switching, during a combined horizontal and vertical line loop all these characters would be written:

-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!

But we do need plus signs in some places.

++-!-!-!-!-!-!-!-!++-!-!-!-!-!-!-!-!++

That would be the clean solution, but is ugly to implement. It is easier to test for “every nth time” instead of “first and second time in a period of n”. In the early versions I did implement it in the complicated way, using two consecutive conditional branches to cover two cases:

        subq.w  #1,d6
        bgt.b   .noplus ; -1 = draw +, reset counter
        beq.b   .norst  ;  0 = draw +, keep counter
        moveq   #17,d6  ; reset period counter
.norst  moveq   #'+',d7
.noplus

The final version, however, uses a simpler “every nth time” check, as seen above:

-+-!-!-!-!-!-!-!-!-+-!-!-!-!-!-!-!-!-+

So we only ever switch a ! for a +, effectively filling all the horizontal lines with - and only overwriting them in the vertical lines with + (where the lines meet).

Because DisplayAlert draws all strings on top of each other, overwriting characters can go wrong. In fact, if we were to put a ! on a +, we would get pixel mush:

However, we can overwrite a - with a +! Lucky for us, the minus sign is fully contained in the plus sign in all variations of the system font!

Thanks, Topaz!

Is this trick really dirty? Maybe, but platforms with a character buffer for the screen don’t even have this problem, so I don’t feel bad at all. :)

All memory is zero, right?

After the submission and the official results I noticed an unintended “trick”. I had assumed that the memory of the message buffer will be all zero, so DisplayAlert would see a string terminator and a null continuation byte at the end. Under Kickstart 1.3, this only worked by accident when I tested it in the emulator and on real hardware.

After a reboot, or if you have loaded other programs before, the buffer area might contain random garbage. DisplayAlert will happily interpret whatever is in memory after the final plus sign as additional sub-messages:

While the executable doesn’t overwrite unallocated memory (good), the memory that is allocated in the file header isn’t cleared under older Kickstarts (bad). This can be fixed in the file header (good) by explicitly requesting our memory with the MEMF_CLEAR flag set, but doing so will only work under Kickstart 2.0 and above (bad). At least Kickstarts 2.0 and 3.1 will zero out the program’s extra memory anway, according to my tests (good).

Good, bad, good, bad – what are the options?

Change the requirements for the executable
- Either demand that it is run directly after power-up, or
- require at least Kickstart 2.0.
Invest 2 bytes to make the code more robust
- Insert a clr.w (a0)
- Costly, but compatible across all Kickstarts!
Take a magnifying glass and re-read the rules
- Ha-haa! This is struck through: ~~Your program must not draw anything else on screen than the object itself.~~
- But this still holds: The final screen may not include any additional characters (which are not from the operating system itself).
- Tough luck… For a moment I thought the extra garbage could be interpreted forgivingly, as in TFM’s CPC entry:
  
  But those extra icons definitely come from the operating system.
Act as if this was intentional from the get-go
- “In sizecoding territory, dirty tricks and evil assumptions are allowed!”
- Kind of lame…

Of course, the second option – make the code robust and stay compatible – is the nicest one! I’ll look into that for a possible after-deadline submission. Ideally after having saved two extra bytes somewhere else…

For now, the entry works just fine with newer Kickstarts, but extra caution must be taken under Kickstart 1.3 (and 1.2). Do not use in production or to operate self-driving cars! :)

Discarded ideas

I am pretty confident I have missed a nice optimization or two. Maybe even one of these that I haven’t thought about enough?

Character manipulation

I’ve looked into shortcuts for the different characters a lot. Can I turn a minus into a plus easily? Or turn +- into ! in one go?

     [0123456789abcdef]
0x20 [ !"#$%&'()*+,-./]
0x30 [0123456789:;<=>?]
0x40 [@ABCDEFGHIJKLMNO]
0x50 [PQRSTUVWXYZ[\]^_]

[ ] -- 32 -- 0x20 -- %00100000
[!] -- 33 -- 0x21 -- %00100001
[+] -- 43 -- 0x2b -- %00101011
[-] -- 45 -- 0x2d -- %00101101

[!] -1 = [ ] (space)
[-] -1 = [,]

[-] -12 = [!]
[+] -12 = [□] (replacement character)

[-] -2 = [+]
[!] -2 = [□] (replacement character)

[-] eor 6 = [+]
[!] eor 6 = [']

[-] eor 12 = [!]

Guess not…?

Rolling bits

Having a shifting bit pattern to do the “write a plus” decision and also the end-of-loop test would have been nice. But the pattern was to wide for that.

  move.l #%01000000001000000001000000000000,d0
.loop
  add.l d0,d0 ; 1-bit left shift
  ; negative --> output a '+'
  ; zero     --> line is complete

Couldn’t incorporate that in a space-saving, practical way. That move.l alone eats up 6 bytes!

Self-modifying code

That would have been cool, by definiton. We do have a pointer relative to our own code in a3. And if we request chip memory in the file header, this could even be compatible with faster processors, as caching of code in chip memory is disabled (if I remember correctly).

But I didn’t encounter an obvious opportunity for code-patching.

Different continuation byte

In the DisplayAlert substring format, the continuation byte doesn’t need to be a 1, it can be any byte other than zero. If we could somehow make sure that the x and y registers contain a value like 00nn in the upper word, we could set the string terminator, the continuation byte and the x value with one move.l!

  ; write [00 nn 00 xx yy ch] backwards
  ;        |         |
  ;        +---------+-- one move.l for all of this
  
  move.b d7,-(a0) ; '+' or '-' or '+'
  move.b d2,-(a0) ; y
  move.w d1,-(a0) ; x
  move.w d0,-(a0) ; string terminator and continuation byte
  move.l d1,-(a0) ; string terminator, continuation, x

That would save 2 bytes, but how do we pre-fill the registers like that? Put x and y into one register and do a swap instead of an exg? But then we need code to initialize that, and DisplayAlert wouldn’t have a “clean” long value for the height anymore.

We could also bset d1 with itself each time we initialize the x value, always setting bit #22 to 1:

.2lines moveq   #22,d1  ; d1 = 0000 0016
        bset    d1,d1   ; d1 = 0040 0016 now, matching 00nn xxxx

Then we could do the move.l write. But: The code size would be the same, and there still wouldn’t be a suitable value we can pass to DisplayAlert as height. Also, this would interfere with the d1 and d2 exchange happening during .chr, working only half of the time.

Wrap it up better?

That bow \O/ and its coordinates eat up six precious bytes.

  dc.b 0,86,14,'\O/' ; 00 56 0e 5c 4f 2f

Outrageous, but cute.

Speaking of wrapping up…

So much for keeping it short. But thanks for reading! When writing this, I was lowkey hoping for another optimization epiphany. Such is the rabbit hole that is sizecoding! :)