Start

heckmeck!

Nerd content and
cringe since 1999
Alexander Grupe
Losso/AttentionWhore

Yes, we all � Unicode, or rather, the joy of competing legacy encodings and conversions.

Even when you’re on holidays, a nice soup of encoding chaos can give you a chuckle – especially when it’s presented to you as the bar tab and you first have to scratch your head a little:

Later I got another receipt, this time without mangled-up characters. Interesting! Even if mojibake looks random, charset conversions do not happen randomly. Indeed, we might guess some substitutions when we compare the receipts:

Most Greek letters turn to question marks, with some exceptions: “Α” (capital Alpha) always becomes the Euro sign “€”, and “Ε” (capital Epsilon) turns into double lower quotation marks. Since the number of characters stays the same, this seems to be an 8-bit fuckup (i.e. there’s no UTF-16 or UTF-8 involved, where we would expect multiple garbled output characters for some input characters). The fact that we see “ƒ” (small F-with-hook) and double quotation marks hints to code page 1252 (or code page 1253 for Greek?). Maybe the ancient code page 869 is involved, too?

Char Codepoint UTF-8 CP-1253 CP-869
Α U+0391 CE 91 C1 A4
Ε U+0395 CE 95 C5 A8
U+20AC E2 82 AC A4
U+201E E2 80 9E 0A 84

A-ha! Or should I say: Ηὕρηκα! The messed-up texts were originally CP-869-encoded, and those bytes were then interpreted as as Windows-1253!

…or were they? At least for the capital Alpha this would work, but the capital Epsilon would end up as “¨” (diaeresis) instead of lower double quotation marks. A font issue? Also, this requires a 1253-variant where A4 is the Euro sign, not the generic currency symbol “¤”.

Since I’m on holidays, I consider this riddle half-solved for now, and I’ll celebrate with half a liter of ΜΑΜΟΣ beer! :)

previous next close
eie