heckmeck!

Nerd content and
cringe since 1999

Alexander Grupe
Losso/ATW

Temba, seine Arme weit!

A 512 byte intro made for Nordlicht 2025. Also, a tribute to the mother of all (German) fan-subs: “Sinnlos im Weltraum” :)

WTF is “Sinnlos im Weltraum”? A bunch of dubbed Star Trek TNG episodes made in the late 1990s, all in a weird German dialect, putting a spin on the story lines and each of the characters: Picard becomes a grumpy choleric who constantly threatens to beat up everyone, Riker is a child-like doofus, Geordi is utterly confused all the time, etc. Black coffee plays a big role, as do the self-made sound effects and music. SiW gained a bit of a cult following – it’s even got its own Wikipedia page!

The intro shows Dathon, the Tamarian captain from the TNG episode “Darmok”. In the German SiW version, he has a very unique way of talking and says “Joooaaaahh” a lot – that’s also in the intro! :)

Tamarians in TNG, Lower Decks, and my own rendition

Image copyright for the Tamarians on the left: Paramount Global, used in low resolution under “fair use” terms for educational purposes.

Why the “Sinnlos im Weltraum” hommage? Because the party website was sci-fi themed (galactic quest, futuristic neon adventure, etc.), and I have a habit of adapting the party theme in 512 bytes!

It speaks!

Well, at least it does the “Joooaaahh” sound from the SiW episode. This was the main idea for the intro: Use the new speech synthesis method I stumbled upon to produce vowel sounds, then let them blend into each other.

The implementation is simple: Each vowel sound is the sum of the first n harmonic waves of the base frequency, each multiplied by a factor, and those factors shape the sound into an “eeeh” or “ooh”.

For some background, pseudocode, and actual code, see:

The important thing is that this approach seemed like a good fit for a 512 byte production. To prototype the “Jooaah” sound, I hacked up the speech tool to include an FFT display of a waveform that I could overlay onto the harmonic factor knobs, reconstructing the original “Jooaah” sound one vowel at a time:

After that, I coded the speech synth in 68000 assembly, with the help of another prototyping tool. This tool runs the assembly code and plays back the generated wave form immediately.

Wave output test tool
  ; @factors = points to 8 harmonic factors
  ; @wav     = points to the waveform we're building

  moveq   #0,@smp                 ; initial sample value := 0
  move.l  @sinpos,@pos            ; copy global sine position

  rept 8
    move.l  @pos,@tmp             ; look up sine_table[sinpos]
    swap    @tmp
    and.w   #1024-2,@tmp          ; sine table has 512 words
    move.w  (@sintab,@tmp.w),@tmp ; sine value
    move.w  (@factors)+,@val      ; factor for this harmonic wave
    addq.w  #2,@factors           ; skip fractional bits
    muls    @val,@tmp             ; sinval *= factor
    add.l   @tmp,@smp             ; sample += sinval
    add.l   @sinpos,@pos          ; next harmonic
  endr

  swap    @smp                    ; sample: $007fff00 --> $007f
  move.b  @smp,(@wav)+            ; output sample byte

All values use a 16+16 bit fixed floating point format, i. e. the upper 16 bit word is the integer amount and the lower 16 bits represent the fractional part. (The 68000 CPU doesn’t support “real” floating point arithmetic.)

$00010000 = %00000000000000010000000000000000 = 1
$00018000 = %00000000000000011000000000000000 = 1.5
$0003243f = %00000000000000110010010000111111 = 3.14159
etc.

This way, we can fine-tune the base frequency and interpolate the harmonic factors very smoothly.

Originally I planned to do the sound synthesis in realtime, writing the audio data with the CPU, but all those multiplications were too slow for an acceptable sample rate. Instead, there’s a single 64 KB buffer now that gets repeated throughout the intro.

I like big pixels and I cannot lie

The second thing I built for the intro: A low resolution, but recognizable Tamarian head! Cute or dorky, if possible…

After several revisions, I went for a resolution of 15×14 pixels, stored as 14 rows of 16 bytes each. In other words: The pixels are already horizontally stretched, with each fat pixel taking up one byte = eight pixels.

__ = 0
XX = $ff

  dc.b __,__,__,__,__,XX,XX,XX,XX,XX,XX,__,__,__,__,__
  dc.b __,__,XX,XX,XX,__,__,__,XX,__,__,XX,XX,__,__,__
  dc.b __,XX,__,__,XX,__,__,__,__,XX,__,__,__,XX,__,__
  dc.b __,XX,__,__,__,__,__,__,__,XX,__,__,__,__,XX,__
  dc.b XX,__,__,__,__,XX,XX,XX,__,__,__,XX,XX,XX,__,__
  dc.b XX,__,__,__,XX,__,__,__,XX,__,XX,__,__,__,XX,__
  dc.b XX,__,__,__,XX,__,XX,__,XX,__,XX,__,XX,__,XX,__
  dc.b __,XX,__,__,__,XX,XX,XX,__,__,__,XX,XX,XX,__,__
  dc.b __,XX,XX,__,__,__,__,__,XX,__,XX,__,__,__,XX,__
  dc.b __,XX,__,__,XX,__,__,__,__,__,__,__,__,__,XX,__
  dc.b __,XX,__,__,__,XX,XX,XX,XX,XX,XX,__,__,__,XX,__
  dc.b __,__,XX,__,__,__,__,__,__,__,__,__,__,XX,__,__
  dc.b __,__,__,XX,XX,__,__,__,__,__,__,XX,XX,__,__,__
  dc.b __,__,__,__,__,XX,XX,XX,XX,XX,XX,__,__,__,__,__

The vertical stretch is done by the copperlist: Display a new line, then display that same line for the next seven display rows, repeat. In each line, we wait for the right-most display position, ignoring (masking) the Y position, and tell the display DMA to either continue with the next line or skip back 16 bytes. This way, we get a nice, repetitive chunk of Copper wait commands:

  dc.w $0108,-16   ; repeat first line (blank)
  dc.w $8101,$fffe ; top y position ($81..$f9)

  dc.w $80df,$00fe,$0108,0 ; show new line (  0 = continue bitmap)
  dc.w $80df,$00fe,$0108,-16 ; repeat line (-16 = go back 16 bytes)
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line

  dc.w $80df,$00fe,$0108,0 ; show new line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line
  dc.w $80df,$00fe,$0108,-16 ; repeat line

  etc.

“But all those repetitions and those fat pixels – don’t they use up a lot of space?” They do, but only before the compression!

Better compression

I fiddled a lot with the compression this time (again using ZX0 as the compression algorithm).

  • I optimized the common 68000 decompression routine for ZX0 – it’s two bytes shorter now.
  • I used a smaller ZX0 end marker for my data. This only saved two bits of control data, but sometimes this can save you a byte in the compressed payload!
  • Less data: I removed in-between interpolation target steps as long as the resulting sound still sounded more-or-less the same.
  • I saved another two bytes by doing something very dirty: The decompression target address is taken from the opcodes of the depacker code itself. Drawback: This will only work on 68000 CPUs where the upper 8 bits of a 32-bit address are ignored.
          lea     $50000,a4       ; 49f9 0005 0000
          move.l  .opc(pc),a4     ; 287a 0032
          ...
    .opc  move.w  d5,d6           ; 3c05
          lsl.w   #8,d6           ; e14e
    
          ; a4 now contains $3c05e14e which is treated
          ; like $005e14e on the 68000. 2 bytes saved!
  • Compression optimizations by analyzing the raw data. I wrote a tool to visualize repeating data words and help me pick color values from existing binary values. I found some reusable values this way, but it got tedious quickly…

  • Random, mostly gut-driven compression optimizations: Flip the Tamarian head vertically, try out different opcodes for the same tasks, insert a chunk of zeroes somewhere. High dopamine levels possible when successful, but also high levels of “What exactly am I doing here?” :)
  • As a final test to my sanity, I started playing around with random permutations. There are sections in the code where the order doesn’t matter for the result: Disable sprites first, then install the copperlist, or the other way around? But also data blocks and… well, nearly everything.

Trying out all possible combinations for a block of code quickly becomes a huge task. For nine lines, there are 9! = 362,880 different sequences to be tested, each with a cycle of “assemble, ZX0-compress, measure the number of bits”. Also, the results are often counterintuitive and brittle, heavily depending on all the other bits staying where they are.

That being said, this approach did help me save the last two bytes I needed when I was already on the train to Bremen. Through the power of brute-forcing, the intro now exits cleanly and restores the mouse pointer when done.

Compatibility

Having the intro exit cleanly is nice, but I would have loved to make everything more compatible and avoid unpacking into a fixed, unallocated memory region as well.

I had already found a way to do that while saving two bytes in the decompress code, by using the destination address as the start-of-data address: We can just reverse the bytes of the ZX0 data and replace every (a0)+ with -(a0). Starting from the same address, the decompress code would read the ZX0 data backwards and write the decompression result forwards.

  lea     .zx0(pc),a0     ; 41fa 005c - source data
  move.l  .opc(pc),a1     ; 227a 0034 - absolute destination
  lea     .zx0_end(pc),a0 ; 41fa 01d8 - source data
  move.l  a0,a1           ; 2248      - destination = source
  pea     (a1)            ; 4851
.zx0_decompress:
  ...

  incbin  meat.zx0.reversed

.zx0_end:

  dx.b 1024*80 ; room for decompressed code and data

The dx.b at the end would reserve some BSS space via the executable header without affecting the binary size.

Initially, the code got bigger after this change (we cannot hardcode the bitplane address in the Copper list anymore, and other little changes), but this still may have worked. I just ran out of time and patience to explore this further…

A final touch

As luck would have it, I drew the Tamarian’s mouth exactly in a way that I had four consecutive pixels as the bottom part, allowing for a primitive mouth animation:

In code, I could set these four pixels with a timer-based longword move:

    moveq   #-1,@q          ; -1 = $ffffffff (mouth open)
    add.w   #$9f,@blink     ; magic timer constant
    blt.b   .noblnk         ; switch mouths when negative
    moveq   #0,@q           ;  0 = $00000000 (mouth closed)
.noblnk
    move.l  @q,bitmap-12*16+6

Download and links

previous next close